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FOREWORD 


Tus is the fourth issue of the Review devoted entirely to a considera- 
tion of the general field of research methods and technics and of appraisal 
in education. The general plan of this issue is patterned on that used in 
the third cycle published in December 1942. Unfortunately, the plan 
was not carried out completely. The primary deficiency is the lack of a 
chapter reviewing the important fields of the logic of research (opera- 
tionalism, symbolism, and semantics) ,' procedures in reporting research, 
and the evaluation and implications of research in education. 

Rationalization for the deviation between plan and product is too easy. 
All the contributors carried a heavy burden during the war period. Some, 
in their capacity as consultants in the war effort, were so overwhelmed 
by their tasks that they felt that they could not carry to completion the 
promised reviews. The majority of the promised reviews, however, were 
completed with the assistance of other reviewers. Particularly, it should 
be pointed out the Paul Blommers assumed the difficult task of reviewing 
the recent developments in statistical theory for the period from January 
1943 to July 1945. 

All other chapters of this issue cover the period from July 1942 to 
July 1945, The reviews may seem unduly brief. Their brevity is due 
entirely to space restrictions occasioned by the paper shortage. Whereas 
the December 1942 issue was allowed 120 pages, the current issue was 
restricted to about 90 pages. 

Research technics and methods, as represented by the three-year cycle, 
are continuing to develop and expand. Perhaps the area that has developed 
most rapidly, and sometimes too rapidly for education to keep abreast 
of it, is the field of statistical methods. The published literature is a rich 
resource for the research worker in education, but the unpublished advances 
will be even richer. During the war, great developments have been made 
in this field. A very significant new development is the work of Wald and 
his co-workers in the statistical research group at Columbia University 
on “Sequential Analysis of Statistical Data” both in theory and in appli- 
cations.” Other developments made in connection with the war effort soon 
will be published. 

It is a pleasure to extend thanks to the committee that planned this 
issue, and to the contributors who gave reality to the plan. 


Irvinc LoRGE 
Chairman 





2 It is significant that the September 1945 number of the Psychological Review devotes its entire issue 
to a “Symposium on Operationism.”” [Vol. 52, No. 5, p. 241-94). 

* The first publication of this material is in the Annals of Mathematical Statistics 16:117-86; June 1945. 
Other articles will be published in the near future in the same journal. 
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CHAPTER I 


Library Resources and Documentary Research 


DOUGLAS E. SCATES 


To persons of direct action who find satisfaction predominantly in the 
dynamics of face-to-face relations, documents have slight appeal. To per- 
sons of deliberative temper, the contents of written records are both the 
means of order and the basis of progress. The recorded fact and the 
written communication are not only essential instruments for the avoidance 
of chaos in a complex society, but they span the reaches of space and 
time and enable the thinker and the discoverer to transmit carefully 
observed conditions and thoughtfully developed insights to those who may 
be at a considerable distance or in a later period. Thus each worker— 
each generation—is spared the necessity for beginning all over again. 
Thru documents the isolation of individuals far apart in space or time 
is changed into opportunity for joint endeavor. We may weary of the 
multitude of books and the incessant stream of paper, yet we know that 
in such materials are found the concepts of the present and the hope 
of the future. 

The present chapter, as indeed this entire issue, is given not so much 
to the products of research as to the means of research. Entries in the 
bibliography represent for the most part discussions of what should be 
done or how it should be done, and they refer to finished studies only 
as examples. The present reviews are therefore somewhat more in the 
nature of guides to literature than is contemplated for the regular issues 
of the Review. 

The scope of the present chapter is, by assignment, the entire range 
of technics and procedures appropriate to working with documentary 
sources. The chapter carries forward the treatment of Good (33) in the 
December 1942 issue of the Review and that given more extensively in 
four chapters of the December 1939 issue. 


BIBLIOGRAPHICAL AIDS AND LIBRARY RESOURCES 


The years of social turmoil, rapid change, and furious activity brought 
by the war have produced many new facts with which to reckon, many 
new ideas to be sifted, and many new and unusual documents to be studied. 
The momentous shifts in the physical and mental world place added 
emphasis on bibliographical tools; without them, the researcher stands 
futilely before a mass of material expanding more rapidly than he can 
work. The field of education is among the most thoroly indexed and 
summarized of the academic disciplines or the practical arts, yielding 
place perhaps only to law. Yet these comprehensive and specialized tools 
are of value only when known about, understood, and used. 
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General Library Tools 


The writer must confess to some tendency to neglect the more general 
library aids: perhaps because our own education indexes usually offer 
more than we can immediately utilize; perhaps because the general sources 
require more searching and selecting; but probably because many of 
these tools require a degree of maturity in library work which does not 
come in one’s early years. The librarian’s professional books and the 
general reference works often do, however, offer the educator material 
of stimulating breadth and make him feel the narrowness of his con- 
templated approach. They would be valuable if they did nothing more; 
but in some cases they provide the only systematic source of information 
about material which one seeks, and in most cases they offer additional 
material. 

Indexes—Three new volumes, from the H. W. Wilson Company, bring 
two series more nearly up to date and extend a third series further back. 
The Cumulative Book Index, 1938-1942 (18) constitutes the third perma- 
nent supplement to the United States Catalog, fourth edition, published 
in 1928. This cumulation will reduce the number of temporary supple- 
ments one must handle. The Bibliographic Index has issued its first six- 
year cumulation (21) covering 1937-1942. The volume has been augmented 
so that it includes 5000 more bibliographies than appeared originally in 
the quarterly and annual volumes. Approximately 50,000 bibliographies 
are listed on 9260 subjects. Twenty-three pages of bibliographies appear 
under the head, “Education,” or “Educational”; two and one-half pages 
under “Learning, psychology of,” etc. As such, the publication keeps 
Monroe and Shores’ Bibliographies and Summaries in Education up to 
date. However, it lacks author entries. 

The Nineteenth Century Readers’ Guide, 1890-1899 (26) marks a 
serviceable and promising venture in extending backwards the general 
periodical index which has, since 1900, been in widespread use. This 
new two-volume index covers fifty-one periodicals for the last decade 
of the nineteenth century, and in addition, indexes fourteen of these 
beyond 1900 up to the time they were taken into one of the Wilson 
indexes of the present century. Education, the School Review, and the 
National Education Association Proceedings are three educational serials 
which are covered back to 1890. The indexing of earlier decades may be 
undertaken. While the nineteenth century is regarded as the domain of 
Poole’s Index to Periodical Literature, this is so by necessity rather than 
by right. Poole’s index did not use a standard list of subject heads but 
employed catch titles lumped under large heads; it did not give author 
entries, provided no cross references, and offered only incomplete bibliog- 
raphical information. 

Ireland issued An Index to Indexes (48) which provides a serviceable 
list of all kinds of indexes to books and periodicals, by subject field. 

Indexes to legal literature—The H. W. Wilson Company’s Index to Legal 
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Periodicals (1) appeared in a three-year cumulation and in a one-year 
cumulation during the past triennium. This index appears monthly and 
extends back to 1908; up thru July 1928 it was published directly by the 
American Association of Law Libraries. It provides a subject, author, 
and book review index, and a table of cases. Volume VI of Chipman’s 
Index to Legal Periodical Literature (22) continues a subject and author 
index which dates back to 1886, known as Jones’ Index up to 1898. The 
Legal Periodical Digest (53), which dates back to 1928, is a loose-leaf 
publication which cumulates by subject. While it is a digest rather than 
an index, it provides an index to topics, cases, and authors and thus 
serves as an index somewhat after the fashion of Psychological Abstracts. 

General reference works—The second edition of the Union List of 
Serials (42) must be named first among the reference works of the past 
three years. In conjunction to microfilming, discussed later, this volume 
should be of greater value even than its predecessor. The list gives library 
holdings as of January 1941; it covers 650 libraries (three times as many 
as the first edition, published in 1927, as of January 1925) and 115,000 
titles as compared with 75,000 in the first edition. The scope has been 
broadened to include annual summaries of research and numbered mono- 
graph series. Fragmentary listings for Latin-American libraries have 
been ventured. 

Mention should be made of the reproduction, in book form, of the 
Library of Congress card catalog (11) which is, in August 1945, thru 
Rand, volume 122. The project may be completed within a few months. 
These volumes, with current supplements, will take the place of the Library 
of Congress Depository which has formerly been established in a few 
large centers. Similarly, the British Museum General Catalogue of Printed 
Books is appearing in book form; it has reached Clau. Both of these projects 
are author and title catalogs only; entries are not made by subject. 

The United States Quarterly Book List (54) is a new publication, arising 
from a recommendation of the American Conference for the Maintenance 
of Peace in 1936 that the Latin-American nations exchange information 
concerning ideological and technical developments. This list is issued 
to keep one abreast of current publications in the United States on fine 
arts, philosophy, the social sciences, biological sciences, technology, and 
reference works. 

In the field of library guides, a new supplement (1941-43) to Mudge 
and Winchell (56) has been issued. Hutchins (47) has treated reference 
work in a way that is helpful to individuals seeking information, especially 
those desiring current statistical information (41: chapter VIII). Brown 
(17) has issued the fifth edition of a fairly simple outline of library organ- 
ization and procedure. A part of this has been printed separately as 
“Shortcuts to Information.” 

Biographies—A new publication of the A. N. Marquis Company, Who 
Was Who in America, covering 1897-1942 (79) presents biographies of 
deceased persons formerly included in Who’s Who. The twenty-third 
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volume of Who’s Who in America (80) and the seventh edition of Ameri- 
can Men of Science (20) have appeared. Current biographies of men in 


the news continue to be issued by both the A. N. Marquis Company and 
the H. W. Wilson Company. 


Bibliographical Aids in Education and Related Fields 


Since in five minutes’ time one can get from the Education Index more 
than he knows what to do with, what is the value of delineating a multi- 
plicity of bibliographical methods and sources? To those who are easily 
satisfied and whose wants are simple, there is no answer; but to those 
who seek a wealth of resources by which to enjoy the varied facets of 
a rich quality and broad outlook in their perspective, no single approach 
will do. Each individual index or reference work has its own particular 
purpose and makes its unique contribution. Only thru knowing about 
the available aids can we obtain high satisfaction. 

Indexes—T he Education Index has issued a three-year cumulation (1941- 
44) and a one-year volume (19) during this period. The Review of Edu- 
cational Research issued a twelve-year index (60) covering 1931-42. 
Heretofore workers have had to depend on annual indexes which did not 
maintain a consistent set of subject heads; and the index for 1936 was 
lacking. An annual index on business education (31) has been prepared 
since 1940; it is represented as covering thirty-four periodicals thoroly, 
110 selectively, and including 3800 items. Workers in the social studies 
will welcome the 1941-44 supplement to the index of the National Geo- 
graphic Magazine (58), the permanent index covering 1899-1940. 

An index to abstracts of psychological literature of an earlier date 
(1894-1928) was prepared by Ansbacher (10). This work was incidental 
to a larger indexing project in psychology which was never finished and 
appears to have been abandoned. The larger project planned for an 
adequate and detailed topical indexing of the former Psychological Index 
which grouped references only Ly comparatively large heads. The inci- 
dental project was the finding of abstracts for the original articles which 
the Psychological Index covered; abstracts were found for about half 
of the articles, and these abstracts are now themselves keyed to the original 
Psychological Index. These abstracts are particularly valuable where they 
happen to be more accessible than were the original journals, as for many 
foreign publications. 

Thesis lists—The annual Bibliography of Research Studies in Education 
prepared by the U. S. Office of Education was discontinued with the 
1941 issue (covering 1939-40) for the duration of the war; policy with 
regard to resumption has not been announced, but the Office is continuing 
to gather data on completed master’s and doctor’s theses. Good (34) 
has continued to list doctor’s theses under way and Henry (44), following 
Gilchrist’s earlier editorship, has listed those accepted in all fields. 
Blackwell (14) gave a twenty-five year list for certain educational subjects 
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in England. Theses under way in sociology are listed annually (7) as 
are those which have been accepted for degrees (6). Knox (51) listed 
theses on Negro education. 

Booklists—A list of practically all the books and monographs published 
on education is prepared each year by the Enoch Pratt Free Library of 
Baltimore (76,77). From these the sixty most outstanding books are 
selected (43). For the past three years, sixty books could not be found for 
the purpose, and the list has contained only 32, 52, and 34 books. 
respectively. 


Special Bibliographies and Summaries 
of Educational Subjects 


A number of bibliographies and summaries relate to certain areas or 
phases of education. Such bibliographies are correctly tae responsibility 
of those reviewing the individual areas in which they fall; but bibliog- 
raphies are also the province of the general bibliographer, and it seems 
appropriate to mention here those which appear to be significant. The 
areas covered are: business education research (30); child development 
(40); current status of education (36, 37); curriculum making (52) ; 
employment tests (12); health research (25); modern language teach- 
ing (55, 61, 69, 71); Negro education (50, 59); physical education (25) ; 
reading (9, 13, 36, 41); school buildings (68); sociological research (15, 
16). “Selected References” on twenty phases of education have continued 
in the Elementary School Journal and the School Review. 

Research methodology—Smith (67) published a new text in this field, 
and Whitney’s text (78) went thru another printing with slight altera- 
tions. Good has continued his annual bibliography on research methods 
(39) and has also published two reviews of treatises in this area (38). 
Shannon and Kittle (118) analyzed the contents of eight texts on research 
methods in education. Section 30 of the general list of education books 
(76, 77) is devoted to research and general bibliographical aids. Sarton 
(65) offered a critical bibliography of the history and philosophy of 
science. 


Guides to Educational Films 


Bibliographies of educational motion pictures have not previously been 
given in corresponding treatments of the Review, but this medium of 
recording and of instruction is now so widespread that to ignore it further 
is quite inexcusable. Films both represent research and are the subject 
of research. They have value just as truly as the printed word. 

The H. W. Wilson Company’s Educational Film Catalog is now issued 
monthly, beginning with January 1945. The name of the annual publica- 
tion has been changed to Educational Film Guide (24). The latest edition 
lists 4300 films by subject, most of them having annotations. The largest 
single guide to educational films is the encyclopedia published by the 
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American Council on Education (5). This was prepared by the Com- 
mittee on Motion Pictures in Education, under the direction of Charles 
F. Hoban, Jr., and represents some three years of cooperative work in 
selection and evaluation. Nearly a page is given to the description and 
rating of each film. A war supplement was issued (3). Other guides 
have been prepared (27, 46). The United Nations (29), Latin America 
(4), and vocational training (75) are covered in special indexes. Several 
states which maintain state film libraries have issued catalogs for use 
thruout their school systems. 

Hoban (45) discussed sources of information on films and how to use 
these sources. The National Council of Teachers of Mathematics (57) 
discussed “multi-sensory” aids and listed certain films. The following 
magazines have current sections listing films: Educational Screen, Journal 
of Business Education, Nation’s Schools, Scholastic, and School Manage- 
ment. Other guides will be found in the Education Index under the topic 
“Moving Pictures—Catalogs.” 


Reference Works for Educators 


Dictionaries and encyclopedias—The Dictionary of Education (35) 
defining 16,000 terms, sponsored by Phi Delta Kappa and edited by Good 
has appeared. Active work has been under way for six years; a large 
number of persons, including many from the American Educational 
Research Association, participated in it. A Dictionary of Sociology (28) 
and two encyclopedias have been issued—the Encyclopedia of Modern 
Education (63) and the Encyclopedia of Child Guidance (83). Thompson 
(72) prepared a Glossary of Library Terms. 

Biographical directories—The eleventh edition of Who’s Who in Amer- 
ican Education (81) and a Who’s Who in Philosophy (64) were pub- 
lished. For the many other directories which have been issued, one is 
referred to the topic “Directories, Educational” in the Education Index. 


Publication Changes: New Magazines; 
U. S. Office of Education Reports 


New periodicalk—Among the resources of the library of immediate 
interest to educators are several new or changed periodicals. The Society 
for the Psychological Study of Social Issues, which has heretofore pub- 
lished its bulletin, began the Journal of Social Issues (69), a quarterly, 
in February 1945. Whereas its previous bulletin contained technical 
research, the new one is designed to get findings of social research under- 
stood by field workers in education, government, industry, and social 
work. The American Vocational Journal (9) began as a monthly in 
January 1945, replacing the quarterly American Vocational Association 
Journal and their News Bulletin. Higher Education (74) began in Feb- 


ruary as a new semi-monthly medium of communication between the 
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Higher Education Division of the U. S. Office of Education and the field. 
Biometrics Bulletin (8) began in February 1945, covering the application 
of statistical methods to problems of growth. 

U. S. Office of Education reports—Considerable curtailment in reports 
of basic data has occurred during the war. Many data now available are 
old; e. g., there has been no report on subject registrations in high school 
since 1933-34. At the time of writing the 1938-40 Biennial Survey of Edu- 
cation had not been published. When it appears, it will be a dual volume, 
combining both the 1938-40 and 1940-42 surveys. This combining, forced 
by the paper shortage and by budgetary limitations of the Office, means 
the loss of more than half of the material normally published, for the 
combined volume is expected to be smaller than the usual single volume. 
Tabulation of the next survey, for 1942-44, has started. 

For 1946 some expansion of the statistical function of the Office of 
Education is planned, in the form of a new statistical research service. 
This service is to be one section of a new division of central office 
services. 


Microfilming: New Epoch in Research Resources 


In this day of frenzied development, crowded with announcements of 
startling inventions which promise to change our ways of living in so 
many respects, one wearies of fantastic suggestions of the length to which 
new devices may carry us. But already actual practice in a number of 
lines has equalled the wildest dreams of a few years ago. The danger. 
with respect to microfilming, does not seem to be that too much shall be 
expected but rather that too little shall be executed—that we shall be 
bound by inertia to ways of great waste for years after technical develop- 
ments have been ready for use. 

Rider (62), librarian at Wesleyan University, Connecticut, points out 
that at present a book of 250 pages can be reproduced on the back of 
one 3x5 catalog card, and with slight advances in technic, 500 pages 
could be recorded on one side of a card. The front of the card would 
be reserved for a description and abstract in normal sized type. Rider 
does not suggest doing away with all books, as most of us like full-sized 
books; but he does recommend that the reductions be employed exten- 
sively by research libraries, where the expansion in recent years has been 
astounding and where the costs have been excessive. He analyzes the four 
main costs of a library—purchase of documents, cataloging, binding, and 
storage (building and equipment), and shows that micro-cards would 
afford relief. But the main advantage of micro-cards would be to the 
user—the material he seeks would be in the card catalog. Call numbers 
would disappear; even general indexes to periodicals would be unneces- 
sary as the periodicals would be reproduced article by article and each 
filed under its own subject in the card catalog. 

Rider believes that microfilming represents a greater step forward 
than did the change from papyri to flat books, and that we are at the 
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beginning of a new era in library services to the research worker. He 
states : 


Is is possible, whether we realize it or not, that we are approaching the end of an 
era in our library methodology? It is now sixty or seventy years since, under the 
compelling assurance of Dewey and Cutter and Poole and their fellow pioneers, the 
library world crystallized a definite pattern of library technique, which, although it 
has been greatly amplified and refined, has never been basically changed. There has 
even been a tendency in some library circles to take it for granted that it was a final 
technique. But no technology is ever final or finished. (62: 84) 


Fussler (32), head of the department of photographic reproduction, 
University of Chicago Libraries, wrote about the usual library uses of 
microfilm for condensation, preservation, and acquisition (as where books 
are out of print). The research worker is primarily interested in the 
reproduction service which makes all material in any library available 
to him without the expense of travel. A number of the large libraries 
now render this service regularly; the cost varies with the policy and 
with the length of material photographed. At the University of Chicago 
or at the Library of Congress, the cost of reproducing a 500-page book 
is about one cent per page. Shorter material may run three or four cents 
per page. But where several films are made from the same book for 
distribution to several libraries the cost is about one-fourth cent per 
page and may run as low as one-tenth cent. 

Three supplements to the Union List of Microfilms, originally published 
in 1942, have been issued (73), bringing the number of microfilms listed 
to nearly 15,000. Cibella (23) reported briefly on the history of micro- 
filming, noting a strong impetus in the middle thirties. Shaw (66) reported 
cost studies within the range already mentioned. 

Of the more than sixty references on microfilming of the past three 
years four (2, 49, 70, 82) will suggest the wide variety of current uses 
by libraries, municipal state and federal governments, business, and the 
Army. Many articles are found in the Library Quarterly, the Library 
Journal, and the Journal of Documentary Reproduction. The last named 
was started in 1938 by the American Library Association but was dis- 
continued during the war. For those who wish to pursue the subject, 
references will be found under the topics “Microfilms,” “Microphotog- 
raphy,” “Microprojectors,” and “Books—photographic reproduction and 
projection,” in the Education Index, Readers’ Guide, International Index, 
and Industrial Arts Index. 

The Joint Committee on Materials for Research was formed in the early 
thirties by the American Council of Learned Societies and the Social 
Science Research Council, with Robert C. Binkley as chairman (his 
Manual was listed in the 1942 Review). The committee was dissolved 
recently, following the death of its chairman, but it saw many of its ideas 
and hopes being carried out by the Work Projects Administration His- 
torical Records Survey Project (103), the W. P. A. Project for a Survey 
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of Federal Archives outside the District of Columbia, and by the com- 
mittee of the American Historical Association concerned with source 
materials. The files of the Joint Committee have been placed with the 
American Council of Learned Societies (1219 Sixteenth St. N.W., Washing. 
ton, D. cH). 


DOCUMENTARY RESEARCH 


Historiography: the Production of Written History 


Rubin (114) outlined some of the services which statistics can render 
to historical research. In addition to the simpler, direct uses of statistics 
(92, 112) he pointed out possibilities for sampling, testing theories by 
statistical means, the use of probability in the identification of manu- 
scripts (external criticism), and the detection of trends. Statistics was born 
among the social sciences; among them it has a future of great opportunity. 

Zucker (125) prepared an extended American history advertised as 
“the science of history . . . the now-identified fundamental laws by which 
the past can be directly interpreted through cause and effect.” Such is 
the aim of every historical writer; the degree to which Zucker has achieved 
the aim can be judged after a study of his work by those interested in 
the fundamental problem of historical interpretation, Maclver (107) dis- 
cussed social causation at some length from the sociological point of view. 
He noted the shortcomings of contemporary social science and the weak- 
nesses of statistical studies and of operationalism. He argued for causes 
which were actually meaningful in the lives of the individuals involved 
(p. 391). 

Saunders (115) discussed historical scholarship—the current training 
of historians for research. Gottschalk (97) and Kidder (103) dwelt on 
historical sources, the latter delineating in considerable detail the con- 
tribution of the Historical Records Survey. Martz and Smith (108) have 
published source material on the history of education in Indiana, dealing 
with the territorial period. This publication is the first part of a long-term 
project of gathering and publishing copies of original records concerning 
early education in the state. Documentary reproduction, treated in the 
preceding section, must be considered in connection with source materials. 
Fruitful references, both on sources and on history writing, will be found 
in the Readers’ Guide under such topics as: “Archives—United States,” 
“Historical Research,” “History—Historiography,” and “World War. 
1939—Historiography.” 


Historical Research Studies in Education: Examples 


Barth (85) dipped back into early American history to describe the 
interplay between the pattern of Franciscan education (1502-1821) and 
the existing social institutions—the family, the church, the school, the 
economy, and the government. Dickerman (88) dealt with more recent 
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history, tracing the development of the summer school in American 
universities. Richey (112) dealt with the Civil War period in a way 
which adds to our basic knowledge; he made a painstaking statistical 
study. Flockstra (93), Ligon (106), and Williams (123) made historical 
studies based on legislation and court decisions. Three studies, by Good 
(96), Nietz (110), and Smith (119), dealt with textbooks. Two sociologists, 
Bernard and Bernard (86), traced the origins of American sociology; 
and the American Psychiatric Association (84) issued a review of one 
hundred years of psychiatry in the United States. 


Educational History—Reviews of Recent Periods 


Three magazines celebrated anniversaries during the past triennium. 
The Journal of Educational Research in January 1945 (Volume 38, p. 321- 
400) recognized the end of its twenty-fifth year of publication by articles 
which reviewed its own history and that of various phases of American 
education over the same period. Historical notes on the Journal itself 
were prepared by Ashbaugh, Charters, and Woody. Reviews of educa- 
tional developments were written on elementary education by Brueckner 
(87), on secondary education by Douglass (89), on educational measure- 
ments by Monroe (109), and on administrative research by Scates (116). 

The American Journal of Sociology recognized the end of its fiftieth 
year with the May 1945 issue (Volume 50, p. 421-563). The issue was 
devoted to articles which traced developments in sociology during the 
past fifty years, including research methodology (p. 474-82). [These 
reports may be compared with the review by Bernard and Bernard (86).] 

The Psychological Review for January 1943 (Volume 50, p. 1-155) 
celebrated the semi-centenary of the American Psychological Association, 
as well as the centennial of the birth of William James (1842). Papers 
by Fernberger (92), Jastrow (99), Woodworth (124), and others out- 
lined the history and progress of psychology and of the Association. 

Other reviews of decades or quarter-centuries dealt with the progress 
of teacher education, by Evenden (91); child guidance, by Stevenson 
(120); and special education (100). Knight (105) wrote on a century 
of teacher education, and Quattlebaum (111) reviewed the educational 
activities during the war of federal emergency agencies. Eells (90) has 
continued his lists of centennial retrospects. 


Legal Studies 


Since it became the policy of the editorial board of the Review in 1939 
to treat educational legislation in connection with the area to which the 
laws applied, it reserved for the present section to deal only with general 
summaries of school law and historical studies based on legislation. The 
Yearbook of School Law is one of the war casualties since 1942. No general 
summary has taken its place. Many partial summaries have been pre- 
pared dealing with particular phases of education or with particular 
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states. Those who wish to review these are referred to the Education Index. 
topic: “Educational Laws and Legislation.” 


Quantitative Documentary Analyses 


Systematic documentary studies may be made for a number of reasons, 
such as to trace chronological trends, to ascertain the character of books, 
or to gain knowledge about certain items contained in the publications. 

Smith and others (119) traced the history of textbooks in arithmetic 
for the past 150 years, giving spatial analyses at different periods (p. 28). 
Shanas (117) reviewed articles in the American Journal of Sociology 
for the past fifty years, giving the percent of space devoted to different 
topics in each five-year period. These findings are interwoven with other 
factors to interpret changes in point of view and emphasis in sociology. 
Brueckner (87) analyzed issues of the Journal of Educational Research 
with respect to their contribution to elementary education at three dates. 
Fernberger (92) analyzed the number of papers on different subjects 
which were listed on programs of the American Psychological Association 
during fifty years. Gatlin (95) analyzed six educational periodicals from 
1911-1940 to ascertain trends in the teaching of high-school grammar 
and composition. 

Shannon and Kittle (118) analyzed eight textbooks in the field of re- 
search methodology for the purpose of comparing the proportion of space 
given to different subjects. They sought answers to such questions as. 
“What is being taught in courses in educational research? Where is the 
emvhasis placed?” Fredenburgh (94) analyzed fourteen texts on guidance 
and rated them. Kinney (104) analyzed primary readers to see how well 
adapted they were to the ideas and experiences of rural children. 

Kearney (102) ascertained the sentence lengths in 121 first-grade 
readers. Hughes (98) analyzed the topics dealt with in articles about 
education appearing in lay magazines. Kardatske (101) analyzed data 
concerning school superintendents listed in a biographical directory. Two 
studies of word usage will be mentioned, one by Rinsland (113), and 
one by Thorndike and Lorge (122) ; a full review of these should appear 
in the next issue of the Review devoted to the Language Arts. 
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CHAPTER II 
The Case Study as a Research Method 


PERCIVAL M. SYMONDS 


a 

Since the reviews by Olson in the December 1939, and by Strang in the 
December 1942, issues of the Review or EpucaTIONAL RESEARCH of the 
use of the case study in research methodology, progress has been made in 
this field. First, the case study has been of increased value to students 
of research in education, psychology, sociology, and anthropology; second, 
progress has been made in the technics of gathering and treating case 
study data for research purposes; and third, case material has been em- 
ployed in many significant investigations. 


The Acknowledged Value of the Case Study 
as Research Method 


Whereas a few years ago the case study was often looked upon with 
suspicion as a method of research and whereas methods of gathering data 
by group processes that better lent themselves to statistical handling were 
favored more, the literature of the last three years has stressed the use 
of case material at least as a complementary and sometimes as a superior 
research methodology. Thus Hill and Ackiss (29) called the case method 
“a basically sound approach” to sociological research and held that “this 
methodology, moreover, bridges the gap between the stereotyped, factual 
community survey, and the personality-culture community study.” Miles 
(44) referred to it as “one of the most important research methods of 
sociology.” Cantor (11) and Lonsdale (40: 647) pointed out its value 
in social work research with the latter reporting that one private social 
agency, recognizing its worth, has engaged John Dollard to prepare “a 
design for research in case-record material.” Riemer (45: 194) wrote that 
in criminological research to get at the units of causation “we shall have 
to direct our attention more eagerly to the study of the individual case.” 
Young (60) while recognizing that “without quantification there can be 
no science” asserted that “with adequate concepts, careful observation of 
well-drawn small samples and the use of logical analysis some very sub- 
stantial generalizations may be derived.” Angell (2: 214-215) stated that 
in orthopsychiatric research “for real scientific work ... a great deal 
of what has gone under the name of case study is prerequisite,” but he 
adds the following significant comment: “The grave danger in working 
exhaustively with a few cases in order to obtain good analytical ‘hunches’ 
is that the investigator will become so involved in analytical speculation 
that he will never frame definite hypotheses, or that if he does reach this 
stage he will never subject them to the empirical test. Just as many statis- 
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ticians are inclined to leave the matter with an inconclusive correlation 
coefficient so the student of cases often fails to carry through to the 
verification of his hypothesis.” In family life research Rockwood (47: 
647) found that the once utilized historical, descriptive, and typological 
approaches “have been almost entirely replaced by the sociometric and 
case study approaches.” Adequate genetic research, Good (27) held, also 
should make use of case methods. 

Sarbin (50: 600) who compared the prediction of college achievement 
by clinical counselors using case study methods and by regression equa- 
tions derived from nonclinical data reached the conclusion that actuarial 
predictions are more important than predictions based on case study 
material. He stated “the clinicial predictions add nothing to the actuarial 
prediction.” In a later paper Sarbin (51: 214) concluded that “the opera- 
tions of those who reject the statistical method of prediction and substitute 
for it a ‘dynamic’ clinical or individual prediction may be described in 
one of two ways: Either they are making statistical prediction in an 
informal, subjective, and uncontrolled way, or else they are performing 
purely verbal manipulations which are unverifiable and akin to magic.” 

Sarbin’s experiment seems conclusive within the circumscribed field in 
which he operated but inasmuch as case study methods vary according 
to the methods, skill, and judgment of the worker, this experiment by no 
means settles the issue. Sarbin has been challenged by Chein (16) who 
accused him of possessing a narrow conception of the clinical approach. 
Chein pointed out the fact that the clinician is not primarily interested 
in prediction but in effecting change. He would contrast statistical predic- 
tion with experimental control and believes that the cause and effect 
relationships that are shown as one alters the situation and notes the changes 
produced have more significance than the static relationships which are 
shown by statistical correlation. 

Not only has the case study method been extolled but other writers 
have been critical of the statistical as opposed to the case study approach 
to personality research. Thus Frank (2: 246) deemed it “highly ques- 
tionable whether the mathematical and statistical operations we use are 
valid instruments for biological and social data.” Bloch (7: 504) held 
that much of the research which uses multiple correlation actually “leaves 
us with very little more at the end than we had when we started.” Bowman 
(8: 308) stated that “sociological data seldom yield to quantitative ex- 
pression at the present stage of development.” White (54) held that 
“the fundamental problems of sociology, as of ethnology and social 
anthropology, are essentially and intrinsically nonmathematical prob- 
lems.” Witmer (59: 2) pointed out that statistical methods and mathe- 
matical calculations, tho often identified with research, “are but tools 
to inquiry, at times appropriate to the problem at hand and at times 
not at all pertinent.” And Maslow (43: 558) accused traditional mathe- 
matics and logic of being “handmaidens in the service of an atomistic, 
mechanical view of the world” and of therefore being inappropriate in 
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their present form to a holistic-analytic understanding of human person- 
ality. 

Obviously, then, case study research methodology, while still under 
attack has had stout support from many authorities during the past 
three years. 


Recent Innovations in the Gathering and Treatment 
of Case Study Research Data 


During the three-year period there have been three major trends of 
development in the gathering and treatment of case study data for pur- 
poses of research. First: there has been an attempt to objectify both the 
collection and the analysis of case material so that subjectivity of inter- 
viewing and interpretation will be minimized; second: some definite 
progress has been made in devising new technics of analyzing, categorizing, 
and quantifying relatively amorphous case data; third: there has been 
a steady rise in the application of involved statistical procedures, includ- 
ing factorial and variance analysis, to case study material. 

As regards the objectifying of case interviewing, Covner (18, 19, 20. 
21) and Rogers (48, 49) have been particularly instrumental in develop- 
ing technics of recording interviews phonographically in order to make 
them more usable for research and teaching purposes. Covner (18: 112) 
found that normally more than 70 percent of actual interview material 
was omitted from case reports and stated that “research possibilities 
seem almost limitless” for the recorded interview technic. Rogers (49: 
433) reported that the material gained from recorded interviews “is 
priceless raw material for research.” Rogers (48) has also pointed out 
that in nondirective interviewing valuable research data are gathered. 
He points out that the counselor has done nothing to bias the material, 
and there are no evaluations which arouse defense or shut off expression 
so that the material gained is a “chemically pure” expression of the client’s 
attitudes. He believed that nondirective interviewing also leads to im- 
proved reliability of the report. Child (17: 318) advocated that case 
study data be used “for the construction of quantitative scales, comparable 
to those commonly derived from tests and questionnaires.” Child (17: 
309) also pointed out that analyses of case material could be made more 
objective by having the cases “independently evaluated by several com- 
petent judges.” 

There has been some discussion of methods of categorizing and quan- 
tifying uncrystallized case study data. Ackerson (1: 41-42) discussed 
such issues as (a) the prejudicial attitudes or beliefs on the part of the 
informant or examining staff; (b) the varying completeness of the case 
material; and (c) inadequate defining or grouping of terms by the 
indexers. He discussed certain statistical tests which might be employed 
in deciding whether to keep separate or to merge apparently similar and 
overlapping rubrics. Creegan (22) discussed categories to be used in 
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the analysis of personal documents such as diaries, letters, or autobiog- 
raphies. Andrews and Muhlan (3: 108-109) set forth a method of ana- 
lyzing “congruent idea patterns” in the study of personal documents. 
By analyzing data into categories and graphing the frequency with which 
each of these occurred in conjunction with each other category, it was 
possible to find those that showed the same pattern of frequency. Dory 
(23: 285) reported the development of an index of 1275 titles of the 
headings used in psychiatric classifications to be used in the classification 
of psychiatric case records. 

Along this line, work has been done in an attempt to determine the 
fundamental structure of personality; and the results of this work may 
become of service in the classification of case study material. Marzolf 
(42) discussed the statistics of syndrome formation in terms of the 
correlation and grouping of the symptoms which go to make up a given 
syndrome. He discussed the various types of factor analysis which might 
be employed in determining the structure and causative background of 
a given syndrome. Marzolf expected that insight would be thrown on 
the nature of a syndrome by determining the matrix of correlations between 
the symptoms of which a syndrome is made up and the antecedent con- 
ditions which precede them. He pointed out that diagnosis, statistically, 
is the reverse of prediction and that the clinicial worker is as interested 
in knowing what factors have cooperated to produce a certain result as 
he is to know what result will follow from a given combination of factors. 
Cattell (12: 581-583) advocated factor analysis to “detect and delimit 
common dynamic unities” in personality traits with special recommenda- 
tion for the employment of what he called the methods of temporal 
covariation and temporal invariance in determining trait unities. Maslow 
(43: 546), tho pointing out the necessity for caution regarding the data 
correlated and the tempering “of all statistical with clinical and experi- 
mental knowledge,” stated that “there is no reason why correlation tech- 
nic should not be highly useful in a holistic methodology.” And Ackerson 
(1: 14), in a sizeable volume, demonstrated how the presentation of case 
study data on thousands of subjects could be presented “almost entirely 
by the comparison of correlation coefficients.” 

Finally, what might be called the very heart of case study research 
methodology, the formulation of integrated (holistic, total, organismic, 
dynamic) concepts of human behavior from the analysis of case material, 
advanced in the period under consideration. Thus, Riemer (45: 201) 
showed how, from close scrutiny of case studies, “ideal types” could be 
ferreted out whose predictive values could then be statistically tested, 
within certain logical limits. Witmer (59: 10) reported how categoriza- 
tion and quantification of case records may be done so that relationships 
are shown in tabular form “while at the same time the individuality of 
the cases is to some extent preserved.” Bellak and Jacques (5: 38) pointed 
out that case studies can be seen properly only when the three main 
levels of personality, the biological, psychological, and sociological, are 
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adequately integrated “without unduly emphasizing the importance of this 
or that set of facts.” And Child (17: 318), while endorsing the employment 
of quantitative technics, held that “the investigator’s total evaluation of 
the individual subject” was most important because “the scientific utiliza- 
tion of the investigator’s total impressions of the individual may be a. 
prerequisite to the eventual erection and measurement of the most sig- 
nificant variables.” 

In several major respects, therefore, substantial recent progress has 
been made in the development of case study collection and treatment for 
research purposes. 


The Employment of Case Study Technics 
in Significant Researches 


In the actual use of case study material for research purposes, there 
have been two notable trends during the past three years. First of all, 
several comprehensive single case studies have been reported in the 
literature in an effort to give insights into the kinds of personalities who 
were their subjects and to aid in the development of the technic of 
presenting longitudinal case histories. Thus, White (55, 56, 57) has re- 
ported extensively on the personality of “Joseph Kidd.” White, Tompkins, 
and Alper (58) have presented a “realistic synthesis” of one subject, 
including a complete case history and records of many personality tests. 
Burlingame and Freud (10) have given several reports on the develop- 
ment of the child “Tony.” Lindner (39) has devoted an entire volume 
to the hypnoanalysis of one criminal psychopath. Jones (36) has pre- 
sented a book-length longitudinal study of a normal adolescent boy. 
Robinson (46) has given us two rather complete case records showing 
the personality changes that take place in the course of the social case 
work process. And Laton (38) has summarized the life history of “Penelope 
Pride,” a normal woman of the nineteenth century West. 

The second important trend in the utilization of actual case studies 
for research purposes has been the publication of many articles and books 
reporting projects and experiments primarily or entirely based on case 
material. Thus Biber and others (6) made an intensive study of ten 
seven-year-old children who had been students at the Little Red School- 
house in New York City. From this intensive study of individual children, 
she was able to point out certain characteristics of this age level. DuBois 
(24) reported an anthropological study of inhabitants of Alor Island 
in the East Indies by making intensive case studies of a few individuals 
in a community in which she lived. She was able to draw conclusions as to 
the character of the Alorese culture, Jenkins (35) reviewed the cases of 
many Negro children of Binet IQ 160 and above. Beecher (4) analyzed 
twelve case histories of behavior problem pupils. Martin (41) examined 
3000 case studies in order to determine parental attitudes and their 
influence upon the personality development of children. Symonds (52) 
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studied the needs of fifty teachers as shown in their autobiographical 
histories. Ellis (25) employed eighty-four medical case studies in an 
investigation of the psychology of human hermaphrodites. Hurewitz (34) 
went over twenty-five cases to discover some criteria for judging appli- 
cants’ ability to utilize family agencies’ services. Hollis (31) analyzed 
several cases to try to determine the effects of the war on marriage rela- 
tionships. Landis and Bolles (37) studied one hundred physically handi- 
capped women to explore their personality and sexuality. Hirt (30) quoted 
ten case histories of superior children in her study of IQ changes. Acker- 
son (1) also brought out the second volume of his study of children’s 
behavior problems, which he based upon case studies of 2113 boys and 
1181 girls. And in the field of medicine, particularly its psychosomatic 
aspects, important studies too numerous to mention here were published 
mainly or wholly backed by case study reports. 


Summary 


The past three years have been important ones in the development of 
the case study as a method of research. During that time, the case study 
has been highly valued in many fields of research, has been refined and 
augmented along several consequential lines, and has been utilized in 
many notable research studies. However, much remains to be done to 
improve its methodology so that case materials may be amassed and 
treated in a manner that includes, on the one hand, objective appraisal and 
statistical integrity and that, on the other hand, never loses sight of the 
integrated, dynamic, holistic picture of human personality which the case 
study approach to research uniquely may give. 
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CHAPTER III 


Trend, Survey, and Evaluation Studies 


IRVING LORGE and HARRY ORDAN 


Tus chapter continues the reviews of evaluative studies and of survey, 
and trend studies for the period July 1942 to June 1945. The concept 
of evaluation is being extended and adapted not only within the area 
of appraisal but also in survey, trend, and large-scale testing programs. 
While questionnaire studies still predominate, there is an increasing trend 
to the use of other methods of obtaining observational evidence. Further, 
there is a tendency to use the results of previous surveys, records, and 
observations in conjunction with follow-up studies to point up trends in 
education. 


Critical Background for Evaluation 


Many so-called evaluation studies are defective in one way or another. 
Certainly the defects that Davis (35) found as limiting the applications 
of psychological research to schoolroom learning are relevant. He found 
these defects to be: (a) lack of adequate preplanning; (b) failure to 
determine and report validity and reliability of instruments; (c) inade- 
quate time duration of studies; (d) faulty sampling; (e) generalizations 
unwarranted by the observations; (f) lack of confirmation of previous 
studies; (g) lack of standardization of research procedures. Such inade- 
quacies and others are the basis of Scheinfeld’s criticism and reappraisal 
of Goddard’s Kallikaks (138). 

The appraisal controversy between evaluation and measurement con- 
tinues. Scates (137) contrasted the differential objectives of scientist and 
teacher in measurement asserting that there was a fundamental limitation 
in purely scientific approaches to measurement of child growth. Sims 
(143), however, implied that the problem was the distinction between 
observation of phenomena and the values that such data have for educa- 
tion. Evaluation for Monroe (102) was the explicit measurement of all 
aspects of educational growth, thus requiring the defining, identification. 
and appraisal of all behavior related to educational objectives. Courtis 
(23) attempted to clarify the semantic confusion by quoting Thorndike’s 
definition ‘“‘a pupil’s score in a test signifies just such and such particular 
achievement and second only whatever has been demonstrated by actual 
correlation to be implied in it.” Courtis suggested that the fault was more 
often with the tester [or interpreter] than with the test or other observa- 
tion. On this basis he (24) suggested ten steps in educational measure- 
ment including the sponsorship of maturation units. Barr (8) editorialized 
succinctly on the problem. 
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Cook (21), Cowell (25), and Smith (144) reviewed recent procedures 
in evaluation. Smith mentioned nine bases for evaluation of curriculums 
including mastery of basic skills, ways of thinking, understandings, and 
insights as revealed in social behavior, gains in knowledge thru attack 
on personal and social problems, interests as related to activities, personal 
initiative and creative power, sincerity and potency of attitudes, and post- 
school vocational competences. Cook discussed various purposes of 
evaluation and evaluates some of the technics. Tyler (159) indicated the 
relation of evaluation to functional supervision with a specification of 
the six basic assumptions, procedures, the use of evaluation in improving 
instruction, and the results of evaluation. Ragsdale (127) related rural 
community planning to curriculums in rural education. 


Materials and Problems in Evaluation 


The use of personal documents is surveyed critically by Allport (1) 
indicating their variety and values. In the manual prepared under the 
chairmanship of Guthe (59) was given an excellent review of methods 
of collecting data, experimental technics (59: Chapter V), and a bibliog- 
raphy of 682 titles on food habits. 

The problems attendant to an evaluation program were discussed by 
Kirkendall (80) and Houle (69). The technic for beginning an evalua- 
tion program was related to the analysis of educational objectives (48). 
Sweetser (155) emphasized the relationship of neighborhood research 
to interpersonal interaction. 

Lamson (85) demonstrated that college freshmen can be objective 
toward evidences of their intellectual ability and academic achievement. 
The third volume on the Eight-Year Study (145) gave a critical appraisal 
of the development and use of the evaluation instruments: aspects of 
thinking (interpretation of data, application of principles, logical reason- 
ing, nature of proof); social sensitivity (application of social values, 
application of social facts, and generalizations to social problems, social 
attitudes, social and economic beliefs) ; aspects of appreciation; interests; 
personal and social adjustment; and record forms. Pace (122) reported 
on the construction of a situations test arising out of student suggestions 
for appraisal of teacher-training instruction. Ordan (118) surveyed the 
development of social concepts thru tests of vocabulary recognition, 
vocabulary classification and interpretation, and values of social concepts 
thru the headlines test in Grades IV thru IX in New York City public 
schools. The Santa Barbara Behavior Rating Scale (146) was developed 
and applied in conjunction with a revised curriculum. 

A practical procedure for interpreting gains in test-retest scores for 
pupil scores was made (157). Bolton (12) rediscovered McCall’s sugges- 
tion for evaluating teaching effectiveness thru achievement test scores. 
As is usual, there is no adequate consideration of the regression error. 


361 














Review OF EpucaTIONAL RESEARCH Vol. XV, No. 5 





Evaluation Studies 


The report. of Troyer and Pace (158) , Evaluation in Teacher Education, 
has been an important contribution. It described and analyzed evaluation 
from the point of view of institution and individual in selection, orienta- 
tion, guidance, follow-up, and growth in service with dozens of specific 
appraisals including a section on workshops. Hildreth (65) appraised 
the University of Utah Workshop. The important Minnesota studies in 
evaluation were extended by Williams’ appraisal (173) of a representa- 
tive sampling of a hundred students using interviews with their mothers, 
with both parents, check lists and questionnaires, interviews with students, 
data from health records, and tests to appraise the “consumers of general 
education”; and by Eckert’s (46) follow-up of seven hundred former 
students to estimate their readiness for continued learning, orientation 
to personal problems, home and family living, vocational readiness, and 
socio-economic competence. Wrightstone (176) wrote a brief review of 
the evaluation of the New York City Activity School Experiment which 
used tests of basic skills, critical thinking, current affairs, attitudes, and 
personality, the School Practices Questionnaire, and the New York State 
Education Department Scale for Rating Elementary Practices. 

Self appraisals have been utilized extensively. The Cooperative Test 
Service (28) has related college student opinion on adequacy of their 
training to scores on the cooperative tests. Ashbaugh (7) had freshmen 
and seniors rank educational objectives. Criteria for rating or ranking 
faculty by students were developed and used (6, 116), and under Dalin 
(31), a survey of college student opinion was made. 

Using his “Criteria for Teaching and Learning Materials and Practices”, 
Bruner (16) had 945 teachers and administrators evaluate their own ideals 
and practices finding a wide gap between ideals and practice. Antell (5) 
reported on teacher opinion of the value of different supervisory practices: 
Jensen (75) had students appraise eighty-three courses in teacher-training 
institutions on a nine-point scale; and Corey and Froehlich (22) reported 
a study of pupil’s acceptance of responsibility. 


Evaluation of Methods—Experimental 


Seven studies were made of the consequences of membership in 4-H 
projects as contrasted with nonparticipation. Frutchey and his co-workers 
(49, 50, 51, 52, 53, 54, 55) appraised educational growth in terms of 
objectives of information, self-confidence, attitudes, experiences, habits, 
school plans, and vocational plans. The duration of the studies was usually 
five months altho in the dairy project it was eleven months. The method 
involved the evaluation of gains from pretest to posttest comparing partici- 
pants with nonparticipants and also contrasting participants who com- 
pleted their projects with those who did not. In some of the reports, the 
members evaluated sources of help and specific printed materials in terms 
of helpfulness. 
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Peters (123), using his regressional technic for equating pupils, studied 
the results of demovratized education in terms of factual knowledge, civic 
beliefs, school practices, and achievement. Using tests of abstracting and 
organization of information and of drawing conclusions, Anderson, Mar- 
cham, and Dunn (2) reported the results of doing versus telling on critical 
thinking. Weiden (170) using a three-group procedure evaluated the effects 
of giving marks, giving marks with correct answers indicated, and giving 
marks, correct answers and remedial instruction in algebra. Curtis (30) 
used a rotational paired group procedure to evaluate excursions. In a 
five-term study, Ryder (135) appraised the effects of student teachers 
on pupils in terms of achievement, attitudes, and appraisal of teachers. The 
method was matched pupils for regular and student teachers. 

Johnson (76) surveyed the results of changed curriculums in arithmetic 
on pupil achievement, and Overn (120) appraised the effects of the 
specific teaching of patriotism. Jayne (74) in a rotated group experiment 
evaluated the immediate and retention effects of lecture versus silent 
film presentation. Morgan and Steinman (105) evaluated a testing pro- 
gram in teaching of educational psychology. Hamalainen (60) made a 
critical appraisal of the results of teachers’ anecdotal recordings in terms 
of the relationship of the teachers’ appraisal of pupils versus objective 
test information. His study would have been more valuable if extended 
another year or two. 

Pintner and Gates (124) directed the study on the value of hearing aids 
for auditory handicapped pupils. The experiment showed no statistically 
significant difference for matched pupils wearing aids and those not wear- 
ing them in Stanford achievement, aspects of personality, pupil portraits, 
or speech. Duell and Kenet (40) found that pupils attending summer high 
schools gain more than regular pupils. Goldstock (57) in a five-year 
study of remedial readers gave a statistical summary of the results. 


Testing Programs 


A continuation of the Orleans and Saxe studies of arithmetical knowl- 
edge was made. This time (119) two thousand New York City high-school 
pupils took a test of arithmetic reasoning. The evaluation was made in 
terms of variation, details of knowledge, difficulties, errors, frequency of 
errors, and probable cause. Eaton (42) reported the results of the Stan- 
ford Intermediate Arithmetic Test. No marked difference in the achieve- 
ment of pupils as related to the time devoted to arithmetic study was one 
of his conclusions. He (43) arrived at the same conclusion in the survey 
of social studies using the Stanford Intermediate Social Studies Test. 
Are Rice’s conclusions, developed in the 90’s, being reconfirmed? 

Davis (33) analyzed the results of the Kentucky Scholastic Ability, 
Kentucky English, Kentucky Mathematics tests, rating scales for social 
ideals and emotional drive for a striated sampling of 1940 high-school 
graduates as related to college or noncollege attendance. The factor of 
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college attendance was studied in relation to quarters of the distributions 
of the test scores, to place of residence, size of family, income, socio- 
economic status, and family income. 

The sophomore and freshmen testing program in Michigan high schools 
(175) reported on mental capacity of pupils, interpretation of social 
data, interests, adjustments, and behavior patterns. Dalton (32) made a 
visual survey of elementary- and high-school pupils using the Keystone 
Telebinocular. 


Trend Studies 


Several important demographic studies of immediate interest to the 
educator have been published. Beers and Williams (10) studied the age 
structure of Kentucky’s population from 1860 to 1940 for the state, for 
rural and urban regions, by counties and by subregions. Oyler (121) 
for the same state, reported trends in fertility and migration in relation 
to various factors (for education, he used the ratio of high-school attend- 
ance to elementary-school enrolments). Glick (56) reported on family 
trends from 1890 to 1940; Crane (26) indicated industrial and occupa- 
tional trends in New York State from 1910 to 1940. 

Based on a questionnaire, the National Education Association (110) 
has continued its study of salaries of city-school employes including trends 
from 1930-31 to 1942-43, and 1944-45 (111). 

A significant use of high-school reports was made by Landis (87, 88, 90, 
91) in appraising the posthigh-school activities of Washington State 
graduates. Landis pointed out that the data are limited by the reliability 
of the principal’s reports, the categories of activities, and the fact that 
about a quarter of the graduates’ activities are unknown. 

Two trend studies appraised the effects on schools in the war period; 
one on 1426 school systems (112) and the other on teachers colleges (113). 
A series of reports (3) showed the progress of adult education in the war 
period. 

The progress of adult education can be compared with the trend in 
public-school adult education in the cities from 1929 to 1939 (62). 
Curriculum trends for a ten-year period in Iowa (104), trends in consumer 
education (61) from 1938 to 1944, trends in high-school supervisory 
practice from 1936 to 1942 (101), and trends in guidance practices (29) 
in public secondary schools and public agencies in New Jersey from 
1932 to 1935-36 to 1940-41 were primarily based on questionnaire studies. 
The status of New York State elementary-school principals was given 
for 1927 and 1941 (172). 

Reinsehl (131) made a statistical study of trends in time allotments 
for school subjects and in length of school days. Rhodes (133) reported 
on supply-demand data based on questionnaire responses of graduates 
of a teachers college. Law (94) listed nine significant trends in teacher 
education. Morrison (109) showed the increase in research activities of 
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thirty-two state departments of education, and Rockwood (134) suggested 
a trend to the use of sociometric and case study approaches in place of the 
historical, descriptive, and typological procedures. 


Attitude Studies 


A number of studies show trends in changes in attitudes. Despert (37), 
using questionnaire responses of parents after a meeting with them, made 
a special study of children’s reactions to the war comparing seventy-two 
nursery-school children who attended the Payne-Whitney school from 
1932 to 1937 with sixty-three who attended the school from 1937 to 
1942. College students’ changes to war were studied (39, 78, 177) with 
Dudycha (39) implying that scales of the Thurstone type are unsatisfac- 


tory and Hunter (70) reporting on attitude changes of college women 
over four years. 


Follow-up Studies 


The significant Brush Foundation Study of Child Growth contributed 
information on the stability of psychometric results from age three 
months to ten years (44) using both genetic and cross-sectional data. 
The correlation of intelligence tests at age three years to age ten years 
in successive half years decreased from .75 to .60. Another Brush study 
(142) reported on serial examination of an identical group of 999 chil- 
dren from age three months to eighteen years in physical growth and 
development. Lorge (97) studied the test-retest results of 131 persons 
in intelligence over a twenty-year period in relation to the number of 
years of schooling obtained. The test-retest correlation was .60, but when 
years of schooling was added the correlation increased to .80. 

Traxler and Selover (156) gave evidence of the decrement of prediction 
of secondary-school achievement from elementary-school achievement over 
time, but noted that prediction in linguistic areas was superior to predic- 
tion in mathematics. 

A follow-up study of Wickman’s 1927 evaluation of behavior problems 
(4) indicated that mental hygienists and teachers are closer together in 
thinking in 1940. For the most part, this is attributed to a growing 
conservatism of mental hygienists. 

Boyce and Bryan (13), by questionnaire, attempted to find out if per- 
sons in later years rate their teachers as they did while in their classes. 
Unfortunately, the study is not properly a follow-up since the subjects 
were asked for retrospective memory with the stated objective to find 
out if pupil’s judgment changes. Additional follow-up studies were made 
of school graduates’ appraisal of school objectives (98), of the high- 
school program (77), of private school graduates (64), of occupational 
activities of high-school graduates from 1892 to 1939 (63), and of rural 
eighth-grade graduates (19). 
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The Office of Education has made a follow-up survey, by direct interview 
and questionnaire, of supplementary trainees for war production (162) 
and of preemployment trainees, including evaluation of their instruction 
in vocational courses (161). Webster (169) appraised the value of voca- 
tional guidance given two to five years previously. 

An analysis of attendance records in relation to test-intelligence and 
academic records was made for the college class of 1935 (152); an 
appraisal of the educational careers of nonreaders was made over eight 
years (93). The leisure interests of students were studied for the period 
1900 to 1930 (86) and for college women over ten years (164). Morri- 
son (108) studied the holding power of New York State’s rural secondary 
schools and their postschool activities. 

An important follow-up appraisal of the evaluative criteria for secondary 
school was reported in a bulletin of the National Association of Secondary 
School Principals (163). 


Surveys 


A number of surveys of specific areas were made; for instance, a survey 
of the content of general science courses in forty-eight states and 655 
junior and senior high schools (71, 72); of language teaching in Wis- 
consin’s public high schools, covering enrolment, patterns of offerings, 
teaching load, tenure, preparation of teachers, and teacher appraisal of 
objectives (81); of the objectives and organization of elementary-school 
art programs in sixteen cities (95); of planning for education in Ken- 
tucky (139: particularly Chapter 6); of opportunities for community 
study in colleges (117); of evaluational technics for in-service appraisal 
of teachers (165, 166, 167, 168); of religious instruction in Negro col- 
leges (107); of facilities and practices in fifty-eight psychological clinics 
for the diagnosis and therapy of poor reading (84); of the status of rural 
education in the South (36); of the policies and practices of adult edu- 
cation in twenty-one representative cities in California (27); of the 
school organization of seventeen four-year and thirty-four three-year 
junior high schools (82); of the use of radio in 2348 rural and urban 
schools (130); of consumer education courses in secondary schools and 
junior colleges (15); of the teaching of sex education in California 
including extent, nature, personnel, and methods (73); of juvenile delin- 
quency, around the idea of racial segregation and social disorganization, 
in twenty cities (140); and of the frequency of occurrence of items in 
cumulative records in the United States (160: Chapter 1). 

Acceleration under the pressure of war was surveyed in thirty-nine 
representative institutions in relation to admissions, enrolment, faculty, and 
finances (99). Eckelberry (45) reported on the extent of educational 
acceleration in 448 colleges and universities, and Pressey (125) gave a 
first report on the experiences with the accelerated programs in a large 
university. Brandon (14), by questionnaires to college presidents, surveyed 
opinion regarding postwar developments in education. 
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School surveys have not been published to the extent noted in previous 
years. Reller (132) described types of school surveys and suggested their 
relative values. Surveys were made for Newark, N. J. (153), for Tenafly, 
N. J. (38), and for Boston, Mass. (154). A significant survey of the 
opportunities for improving high-school education in Virginia was made 
under the sponsorship of the Virginia Chamber of Commerce (92). 
The high-school seniors were tested with the Cooperative American His- 
tory, the Cooperative English, the Schorling-Clark-Potter Arithmetic tests. 

An analysis of drop-outs and causes, of reactions of employers, of school 
facilities, and teaching procedures is given in an appendix of thirty-four 
tables. The San Francisco elementary curriculum survey (11) was basi- 
cally a cooperative attempt to evaluate the curriculum. Studies were made 
of time allotments; texts; and supplementary materials, curriculum trends 
over thirty years; attainment of pupils in arithmetic, spelling, handwrit- 
ing, critical thinking, and general information. In the survey there was 
developed a general guide in classroom observations (11: Appendix A). 
Carter (11: Appendix C) studied age-grade distribution in 1929 and 
1939, and Kyte (11: Appendix D) analyzed a checklist of supervisory 
needs of teachers. 

The National Education Association (114) surveyed opinions on com- 
pulsory military training. Holland and Hill (66) made a critical survey 
of the CCC as a youth-serving agency, and Davis and Taylor (34) 
evaluated the high-school N. Y. A. program in Colorado. The place of 
work experience in schools was appraised thru a survey of opinions of 
students, parents, employers, and educators by McDaniel (99). 

Using demographic data, Brunner (17) contrasted the educational 
status in the geographic regions of the United States and also by rural- 
urban comparisons. Morgan (106) summarized enrolment trends in 
California schools during 1942 and 1943; Holy and Wenger estimated adult 
interest in public schools thru their own children (68); Punke (126) sur- 
veyed the socio-economic backgrounds of high-school youths; and Landis 
(89) studied territorial and occupational mobility of Washington State 
youth. He had all eighth-grade children and all social science students 
in high school fill out a questionnaire for every older sibling who had 
completed schooling. Landis (89) measured territorial mobility as the 
difference between place of schooling and first and present job, and 
occupational mobility as the difference between occupation of fathers 
and first and present job. 

Eaton (41) studied Indiana University withdrawals by inquiry as to 
reason at registrar, high-school principal, and officials at chapter houses. 
Mooney (103) studied personal problems of freshman women; Young 
(178) surveyed the interests and preferences of primary children for 
movies, comics, and radio, using interviews and analyzing by intelligence, 
age, and economic levels. Feingold (47) made a study of newspaper 
interests and habits of high-school students; Gress (58) tried to evaluate 
the educational needs of potential inductees thru a questionnaire of 
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men in the armed service; Holtorf (67) surveyed recreational activities 
in seven Detroit elementary schools. 

Salley (136) completed a survey and analysis of graduates of teacher. 
training schools and of organizations sponsoring preschool groups in 
New York City relating factors of supply and demand. Kaplan (79) 
structuring the city of Springfield into ecological areas made a study of 
factors related to adult participation in cultural and educational activities. 

Burt (18) used the poll technic to get British schoolmen’s opinion to 
educational reforms; Reavis and Cooper (128) made a critical survey 
of merit rating of teachers; and the National Education Association (114) 
contrasted teachers’ salaries with those of other groups. 


Appraisal of Materials 


Bathurst (9) reported the results of a questionnaire appraisal of phono- 
graph recordings; Wiebe (171) indicated the value of the “program 
analyzer” for radio broadcasts; Woelfel and Robbins (174) evaluated the 
use of radio in the classroom; and Reid (129) gave the results of a critical 
appraisal of twenty-six school broadcasts of the Columbia Broadcasting 
System by teachers using “How to Judge a School Broadcast” and by 
the Ohio staff collective appraisal of the program’s educational values, 
understandability, and enjoyment. Kopel (83) described the development 
and use of twenty-five criteria for evaluating reading texts and programs. 


Frequency Studies 


The French Syntax List (20) based on a frequency study of grammatical 
usage in contemporary French prose has appeared. It is based on credits 
for range and frequency. Lindgren (96) studied the frequency of occur- 
rence of foreign words in newspaper English and reported an increase 
from 1930 to 1940. A frequency count of the mention of personages with- 
out explanation or modification made it possible for Shaw (141) to 
suggest the five hundred leading personages, at least in the Readers Digest. 
He does not give the complete list. 

Smith and his co-workers (147, 148, 149, 150, 151) made a content 
analysis of arithmetic textbooks for five periods: 1790 to 1820, 1821-1850, 
1851-1880, 1881-1910, and 1911-1940, relating these analyses to make-up, 
method, psychology, pedagogy, and the social and economic life of each 
of the periods. The summary (151) indicates trends in arithmetic texts. 


Conclusion 


The variety of material covered in this review was extensive. Basically. 
most of the material was good. Surveys, particularly school surveys, con- 
tinue to make recommendations that seem to be unrelated to evidence; 
questionnaire results still are published even when based on less than a 
quarter of those canvassed; trend studies fail to look into the variations 


868 





December 1945 TREND, SURVEY, AND EVALUATION STUDIES 








in definition and terminology in successive periods. On the other hand, 
many of the studies are analyzed more critically and with proper aware- 
ness of the limitations of the observational data or of the subsequent 
analyses. 
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CHAPTER IV 
Research Methods and Designs 


CHARLES C. PETERS, AGATHA TOWNSEND, and ARTHUR E. TRAXLER 


‘Tue exrert™entat studies in education during the past three years may 
be divided into three broad groups: (a) studies in which the data were 
presented without indication of the statistical significance of the results; 
(b) studies using conventional procedures such as the difference between 
the mean scores of matched experimental and control groups divided by 
the standard error of the difference; and (c) studies employing newer 
procedures in which the Fisher technics were applied. Apparently, there is 
approximately equal representation of studies using the earlier methods 
and those using the Fisher approach. 

As one would expect, studies appearing in certain journals, e. g., 
School and Society, the School Review, and the Elementary School Jour- 
nal, tend to be simpler in design and much less complicated statistically 
than those published in other journals, e. g., the Journal of Educational 
Psychology, Educational and Psychological Measurement, and the Jour- 
nal of Experimental Psychology. The readers of the first group of journals 
tend to be classroom teachers and administrators who are not highly 
conversant with advanced statistical technics, particularly the newer ones. 
Some research studies having important practical applications are neces- 
sarily presented on an elementary plane; otherwise they would be 
neglected by the very persons for whom their results are potentially the 
most helpful. 

Certain studies whose design is not planned with great attention to 
detail and for which the significance of the results is not given in statisti- 
cal terms may nevertheless appeal to their readers as important and 
significant from the common-sense viewpoint. For example, Riefling (64) 
reported the gains of pupils in Grades IX and X on the Iowa Silent 
Reading Test and the Morrison-McCall Spelling Scale after certain instruc- 
tional procedures were used with the two classes. The gains for four 
months in terms of the gain in the grade equivalents were: average gain 
in spelling, 1.3 grades; and average gain in reading, 1.4 grades. The 
implicit control group was the norming population reported by the test 
authors. Altho nothing was said about the statistical significance of the 
gains, it is probable that most teachers would agree that the results of 
an experiment showing gains of about one-and-a-third grade levels in four 
months are important regardless of tests of statistical significance. 

In another study whose design was simply that of the regular school 
situation and whose results were interpreted on a reasonable basis, Kott- 
meyer (41) described an in-service program of improvement in reading . 
begun in November 1943, and continued during the rest of that school 
year. The study reported the mean scores made on a reading test by the 
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eighth-grade graduating class of June 1944, and compared the means with 
those found for the graduating class in June 1943, before the in-service 
program was begun. The difference in the gains was six-tenths of a grade 
in favor of the class which had taken part in the program of improvement 
in reading. No test of statistical significance was applied; but since about 
four thousand pupils were involved in each testing, the difference in means 
seemed to indicate clearly the worth of the reading program. 

Other studies in which the results were stated and interpreted in the 
common-sense, everyday language of teachers were carried on by Folger 
(23), Guiler and Edwards (25), Guiler and Lease (26), Krause (42), 
Tate (75), and Witty (86). 


Classical Procedures and Minor Modifications of Them 


In this Review there is only brief discussion of studies employing 
standard “classical” procedures, e. g., difference between means or dif- 
ference between proportions divided by the standard error of the differ- 
ence (unfortunately named critical ratio), and simple and partial regres- 
sion. In view of the fact that these methods are well known, more exten- 
sive treatment of the so-called newer methods is made. In no sense does 
this selectivity imply disparagement of the studies involving the so-called 
classical procedures. Among the studies done well by these standard 
methods were those by Corey and Froehlich (12), DiMichael (16), 
Johnson and King (33), Landry (44), Remmers (63), Traxler (79), and 
Woodward (88). Among them was also the significant “Eight-Year Study” 
(6) sponsored by the Progressive Education Association. 

In the majority of the studies employing classical procedures, the 
authors had given the essential data needed by the readers. In a few 
cases, too few data were included in the article. For instance, Ludden 
(46), in an interesting study of juvenile delinquency, gave the critical 
ratios of differences between a delinquent group and a control group 
but did not include means or standard deviations. Thus, one reading 
the article could not check the statements concerning significance, nor 
even know the intrinsic magnitude of the differences. 

Occasionally, studies using a conventional experimental technic fail to 
control other variables which may influence the results. A study by Jones 
(35) of the much debated relationship between reading deficiencies and 
left-handedness illustrated this point. She compared the mean Iowa Silent 
Reading Test scores of 569 right-handed children and fifty-seven left- 
handed children. A very slight difference in favor of the left-handed chil- 
dren was found. The “critical ratio” was 1.4. She correctly concluded that 
her data showed no significant difference between clearly left-handed and 
clearly right-handed children and inferred that no specialized remedial 
reading technics were necessary in handling left-handed children. The 
study is of value on this point, but it would have been more conclusive 


if intelligence had been controlled; for it is known that there is a rather 
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close re!ationship between intelligence test scores and Iowa reading scores. 
The group of right-handed children was large, and it is reasonable to 
expect that it was representative of the intelligence level of the school 
from which this sample was drawn; but, since the group of left-handed 
children contained only fifty-seven cases, there is no assurance that they 
represent a fair sampling of left-handed children in that particular 
community. 

Author misinterpretation of data obtained by conventional procedures 
seems to be rare. One such error was noted in a study by Peterson (60) 
of the scholarship of students housed in various living quarters. He com- 
pared the mean grade-point averages of students living in dormitories, 
rooming houses, cooperatives, fraternities, and at home. He reported the 
difference in means, the probable error of the difference, the difference 
divided by the probable error, and the chances in one hundred that the 
results represented a true difference. In considering dormitory versus 
fraternity house, he found that there were one hundred chances in one 
hundred that the true difference of means was greater than zero. He 
interpreted this difference as follows: “Other things being equal, the 
average student in a dormitory at Davis always will do better scholas- 
tically than if he lived in a fraternity.” One cannot, of course, draw such 
a conclusion concerning the behavior of any individual student. All he 
can conclude is that if an infinite number of similar samples were 
drawn from the dormitory and the fraternity-house students at Davis, 
the chances are certain, or practically so, that the grand mean grade 
point average of the dormitory group would be higher than the grand 
mean grade point average of the fraternity-house group. 


Newer Technics 


Certain journals which accept extensive articles, such as those written 
by candidates for the Ph.D. degree, contain many illustrations of the 
application of somewhat more technical, experimental, and _ statistical 
procedures. Among the more sophisticated studies in which the mathe- 
matical procedure was presented in considerable detail were articles 
in the Journal of Experimental Education by Baten and Hatcher (3), 
Clark (8), and Tsao (82). The last study dealt with the relationship 
between grade and age and variability and obtained results contrary to 
the widely accepted common-sense principle that variability increases 
with grade and age. His technics included Neyman-Pearson’s L, test and 
Bartlett’s test of homogeneity. He also discussed Snedecor’s methods deal- 
ing with analysis of variance for unequal subclass numbers and Hoyt’s 
procedure for testing the variability affected by test materials. The ques- 
tion investigated by Tsao is of considerable theoretical importance and 
should be studied further. 

One of the best illustrations of the application of Fisher’s ¢ test (a 
minor modification of the so-called critical ratio) to an extensive set 
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of results was given in a series of four articles by Rulon and others (66, 
67). The authors compared phonographic recordings with printed mate- 
rial in terms of knowledge gained thru their use alone, in terms of knowl- 
edge gained thru their use in a teaching unit, and in terms of motivation 
to further study. They also investigated the effect of phonographic record- 
ings upon attitude. 

Occasionally a new experimental technic is superior to a conventional 
one in getting all possible significance out of the data. In a study of the 
effect of type of desk on results of machine scored tests, Traxler and Hil- 
kert (80) compared the mean scores made on the machine-scoring form 
of the American Council Psychological Examination in Grades IX thru 
XII by two groups of pupils, one group taking the test at desks and 
the other group taking it in chairs with desk arms. Five pairs of groups 
were selected at random and two additional pairs were matched on the 
basis of Otis IQ. The significance of the difference in mean scores was 
interpreted in terms of the difference divided by the probable error of 
the difference. All differences were in favor of the desk group, but they 
were small; and only one was as much as four times its probable error. 
It was concluded that type of desk made slight difference in the results. 
Kelley (38) applied additional statistical technics to Traxler and Hilkert’s 
data. He pointed out that a method was needed combining all of the data 
with due regard to the sign as well as the magnitude of each difference, 
and one that would profit by and investigate the regularity of change 
in difference from grade to grade. Using a regression method, Kelley 
found a p of .0016 at the ninth-grade level as compared with a p of .11 
found from Traxler and Hilkert’s data when they used cases at the 
ninth-grade level alone. Thus, according to Kelley’s method, the difference 
at the ninth-grade level was clearly significant, whereas the significance 
of the difference was not apparent when the more usual experimental 
procedure was employed. The difference at the twelfth-grade level was 
slight and could not be established on the available data, but Kelley 
thought that there was some evidence that the regression was curvilinear 
and that the desk group continued to have an advantage at the upper-grade 
levels. In another article, Kelley (37) developed a test of variance ratios 
of components which was used by Davis (14) in connection with a factor 
analysis study of the Cooperative Reading Comprehension Test. 


Analysis of Variance 


Out of the several dozen studies using analysis of variance in some form 
we can select only a few typical ones for comment. McGurk (47) compared 
test scores of Negro children with those of white children by a three-part 
analysis. A 10 percent sample from more than 13,000 whites and 6000 
Negroes, ages nine to seventeen, were compared by three different intelli- 
gence tests, with age partialed out. The classical method of partialing out 
the age factor would have been by matching the races according to age 
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group and using in the standard error of the difference formula the third 
term involving correlation between the intelligence test scores of the two 
groups thus paired. McGurk accomplished identically the same effect, 
according to the rules of a three-part analysis of variance, by taking out 
a sum of squares “between ages” and subtracting this from the two-part 
residual to get the purified estimate of error (59: 295). The eighteen F’s 
obtained by dividing the estimates of the population variance “between 
races” by this purified “error estimate” all showed highly reliable differ- 
ences between the whites and the Negroes, reaching in every case beyond 
the .001 level of confidence. Since McGurk was intending to make six com- 
parisons with each of the three intelligence tests, this procedure involved 
economy of time as compared with the traditional ¢ test because by thus 
pooling the error estimate for the six comparisons a single estimate would 
serve for all. But he had to assume (as always in analysis of variance 
methods) that the true variance was the same in each of the age groups. 
Altho such an assumption may not be strictly true, it is not too extreme 
for a first approximation in situations where the reliability was sufficiently 
high. 

Hartkemeier (29) studied the factors which affect teachers’ salaries. In 

a correct and effective use of analysis of variance he compared the salaries 
of men and women in schools of different sizes, according to years in 
present position, and the extent of interaction of these factors. He found 
highly significant differences based on sex, size of school, and years in 
present position. He also made, with fair successs, a commendable effort 
to explain to his readers the statistical technics he employed. Fischer (21) 
studied certain interests of college students by means of value-scores on 
the Allport-Vernon scale. By a two-part analysis of variance he found 
highly significant differences between the mean values scores on the six 
tests. Because the same persons marked all the sections of the test, the scores 
were probably intercorrelated. This possibility suggests the need for a 
three-part rather than the two-part analysis he made, so as to account for 

the probable interaction. 

Musselman (53) investigated the factors associated with the achieve- 
‘ment of high-school pupils of superior intelligence. He applied the F test 
to twenty-six tables of data. Then, if the F was significant, he reported an 
inspectional interpretation of the nature of the relation. In the case of 
some of these factors, as different nationalities, place of residence, or type 
of discipline used in the home, the F test was exactly the one needed 
because the classifications represented no quantitative ordering; the means 
of classes could be viewed only as varying at random. But in the case of 
others, as size of family, parents’ education, socio-economic status, or 
scores on a personality inventory, the F test is not strictly appropriate 
because here there is quantitative ordering of the classes and the probability 
of deviation of the means of classes in a systematic manner. Particularly 
in view of the fact that each time he distributed the whole of the sample 
among the k classes with, consequently, unequal n’s, the correct way to 
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deal with material of this second category is curve fitting, and a test of 
the significance of the departure of the constants of the equation of the 
curve from zero—the correlation coefficient for a straight line fit and the 
newly devised parabolic correlation coefficient, or some more complex 
curve, for curvilinear regression. These more appropriate tests are more 
sensitive in showing statistical significance. Musselman’s sample, however, 
was so large that he probably did not miss any important relation by using 
the cruder F test. 

Newell (54) studied the relation between class size and extent of inven- 
tion and diffusion of educational adaptations, using as sample nine classes 
in each of four wealthy New Jersey communities. She used a three-part 
analysis of variance effectively for this purpose. Then the author turned 
to the attempt to apply analysis of variance to the determination of the 
critical level of class size for adaptation, without very good results. A far 
better technic for this. purpose, particularly if there is a reasonably large 
sample of schools, would have been to fit an appropriate curve to scores 
on adaptations on the Y-axis and class size on the X-axis, determine its 
equation by the method of least squares, equate the first derivative of this 
equation in respect to X to zero, and solve for the X-value that corresponds 
to maximum Y. A corresponding operation with the second derivative 
would have shown thru what class sizes the change in invention and dif- 
fusion was most rapid or least rapid. 

Chang (7) studied the ability of Chinese children to read Chinese print 
when in the vertical as compared with the horizontal arrangement. He 
investigated the effect of arrangement, of different materials, and of grade 
levels as main effects, and also the first and second order interactions. He 
found the F’s exceedingly small for all of the interactions, ranging from 
.0007 to .91 in all-of the sixteen instances. These he interprets as showing 
no statistically significant interactions, correctly as the term “interaction” 
is employed. If these F ratios had been inverted, the F’s would have been 
highly significant. That would have meant that the abilities to read the 
different arrangements by the same pupils were highly correlated, and 
likewise with the different types of materials—in some cases even nearly 
perfectly correlated (59: 292). But the interplay of the factors represented - 
by high positive correlation is not the kind of interplay that is named inter- 
action; such high, positive correlation means that the factors behave alike 
straight down the line, while “interaction” covers exactly the opposite kind 
of fact. When the F ratio is set up, as it is for testing interaction, with the 
interaction variance estimate in the numerator and an error estimate in 
the denominator, a fractional F means high correlation; and as the F 
moves up in value thru 1 to a sufficient size, the correlation between the 
factors decreases and even passes to negative. Thus high F’s mean low 
correlation, or a negative correlation, among the factors, and that there- 
fore the factors behave differently in different combinations. 

McNally (48) used the Latin square design to study the readibility of 
certain type sizes and forms in sight-saving classes, using six sets of 6x6 
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Latin squares. He found highly significant differences among individuals 
and among randomly made test forms but entirely insignificant differ- 
ences between type sizes, both in effect upon speed of reading and upon 
number of eye blinks. Since this finding was very surprising, the author 
tested the individual pairs of type size by the conventional t-ratio procedure. 
All but three of the sixty differences were quite insignificant and these did 
not seem reasonable or consistent, thus confirming the analysis of variance 
showing. The author correctly recognized that the equivalent of the taking 
out of the marginal sums of squares in the Latin square is, in the case of 
the comparison of the means of classes two by two, the three-term standard 
error formula which contains the correlation term. It would have been 
necessary to make the paired comparisons if the F had been significant, 
and the author is to be commended for anyway exploring the possibility 
“that some true differences were hidden in the data.” For a nonsignificant 
F for a plurality of classes merely means that, on the average, the separate 
comparisons would not be significant and that the separate t*’s have their 
distribution within the sampling distribution of a true F of 1.00. Snedecor 
advises that if, under these conditions, significant individual #’s are found 
that seem plausible, they should not be trusted from the present study but 
should be subjected to further follow-up experimentation. 

Peterson (61) also used the Latin square, to investigate the effect of 
different combinations of reading and recitation upon immediate and 
delayed recall of long prose selections. 

The character, virtues, and awkwardnesses of the “new” designs are 
typically illustrated in the Wisconsin study of Radio in the Classroom (83). 
This study found almost no statistically significant differences between the 
radio groups and the control groups. The analysis of variance and the 
chi-square technics are carried out and written up quite typically. The 
write-up is hard to read because one must hunt thru the book for the 
meaning of the abbreviations, tho the authors do help the reader by indi- 
eating at least the sign of the differences between means of the classes 
compared under the heading “value of the effect.” Among other researchers 
employing analysis of variance are Knott and Tjossem (40), Selover (70), 
and Thompson and Hunnicutt (76). 

The only case of correctly named analysis of variance encountered was 
by Hamilton (27); all the others were, as is the custom with this termi- 
nology, analyses of sums of squares and comparison of estimates of the 
population variance. Hamilton analyzed the total sample variance of 
achievement in three learning tasks thru various amounts of practice into 
the proportion due to individual differences (“within group”) and that 
due to amount of practice (“between means” at successive levels). She 
was not interested in tests of significance but in the question of the per- 
centage of the variance due to these two factors, and the study of whether 
this percentage allocation was constant or varied according to circum- 
stances. She concluded that the latter was the case. 
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Covariance 


The statistic “covariance” is defined as =xy/N, where x is a deviation 
from the mean of the X-variates and y is a deviation from the mean of the 
Y-variates. Obviously the covariance is the basis of our familiar regression 
equation, for by, equals Xxy/No*,;. Analysis of covariance is employed 
when end scores or end means are to be adjusted by means of the straight 
line regression equation for differences in some equating factor. 

Three available procedures for thus adjusting scores or means for differ- 
ences between groups on some initial criterion of learning ability, which 
come under the category of analysis of covariance, were employed in the 
period here under review. These methods all accept groups of unequal N’s 
and/or of unequal mean abilities on the matching factors and make sta- 
tistical adjustments for these differences, thus replacing the loss of subjects 
involved in the older matching method. The Fisher method of covariance 
was employed by a number of researchers, among them McNiel (49), 
Rostker (65), Stewart (72), and Willits (85), the Stewart write-up being 
particularly well done. This method is adapted to comparison of either 
two or more method groups and on either one or more equating variables, 
the simple regression equation being used in the case of one equating 
variable and the partial regression equation in the case of more than one. 
It employs for adjustment a regression equation made by combining 
experimental and control groups and is for that reason particularly well 
adapted to small samples because pooling the several groups yields a 
more stable regression equation than one alone would give. But it has 
as disadvantages (a) the fact that it must get the adjusted “between” 
variance estimate by subtracting sums of squares and must therefore make 
the F test rather than the more meaningful ¢ test; and (b) it does not lend 
itself to diagnosis of the achievements as both of the other methods do. 

The Peters’ regression technic is best illustrated by the study by Van 
Voorhis (58, 84). Van Voorhis experimented with the possibility of im- 
proving a supposedly “primary mental ability’—space perception. He 
gave, to an experimental group of forty members in descriptive geometry, 
systematic training in visualizing space relations and compared their 
mean gain on the Thurstone test of this factor with that of a control group 
of forty-four members and also the means of grade points earned in descrip- 
tive geometry by the experimental and control groups. A partial regression 
equation for predicting end scores from a team of five matching factors 
was built up from the statistics of the control group; then the scores of the 
members of the experimental group on the five matching factors were 
employed in the regression equation to predict what end scores they should 
make if the experimental factor had no differential effect. He found that 
the experimental group exceeded “expectation” by a mean of 17.25 points 
on the Thurstone test and .65 in grade-point average in descriptive 
geometry. These differences yielded standard error ratios (by the Peters’ 
formula) of 3.40 and 3.09 respectively, where the standard error ratio has 
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the same meaning as Student’s t, So the differences between the “expected” 
means and the attained ones were highly significant statistically on both 
of the measured outcomes. This is exactly the same standard error ratio as 
would have been achieved if the groups could have been handled as 
perfectly matched groups in a matched group experiment. The Peters’ 
technic differs from the Fisher analysis of covariance in making its regres- 
sion equation from the statistics of the control group rather than from the 
experimental and control groups pooled, on the ground that a pooled esti- 
mate would be a meaningless hybrid if the two groups differed by reason 
of the experimental factor (as they almost certainly would in the total 
population) ; and it also differs by permitting an analysis of the differential 
achievement. Case studies can be made of those individuals who exceed or 
fall short of expectation, for these discrepancies are indicated for each 
subject; or statistical studies of these differential effects can be made by 
fitting simple or partial straight or curved line regression equations to the 
end-score deviations from expectation and the matching factors as co- 
ordinates. Or, if such analysis is not wanted, the individual predictions 
need not be made; the regression equation can be applied directly to the 
differences between experimental and control group means on the matching 
factors for the purpose of finding the adjustment for the end test means, 
just as in the Fisher method. 

Instead of making separate steps of the adjustment of end results for 
differences in matching factors and the diagnosis of the differential out- 
comes, as the Peters’ method does, the Johnson-Neyman’s method combines 
into one process these two steps by determining “regions of significance”. 
The Johnson-Neyman’s technic is very ably used and clearly written up 
in a study by Deemer and Rulon (15), in which they compare the effec- 
tiveness of two shorthand systems. This method, when used with two 
matching factors, yields equations for curves from which graphs can be 
laid out on a plane surface bounding regions for the matching factors 
within which the differences between experimental and control groups on 
end scores are significant at the designated levels of confidence. If there 
is any law governing the kind of subjects for which one method rather 
than the other is significantly better (e.g. bright pupils with good socio- 
economic background favor one method; dull ones with poor socio- 
economic background favor the other), these graphs mark off the bound- 
aries within which the one method or the other is significantly better. 
Individual pupils may be located on this plot and thus the potency of 
the method may be inferred for them. In the typical study the method is 
used with two matching factors only; with one matching factor the 
individuals would be located along a linear continuum, while with three 
or more they would need to be posited in a form in space of three or 
more dimensions—in some form of conicloid (elipsoid, paraboloid, or 
hyperboloid). 

Some of the additional researchers who used the Johnson-Neyman’s 
technic are Clark (8), Hansen (28), Johnson (32), and Treacy (81). 
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Uses of Chi-square 


As would be expected, a considerable number of studies employed chi- 
square, of which only a few can be reviewed. Daniel (13) had respondents 
declare interest in a list of one hundred items for a library by five 
categories, as follows: strongly yes; yes; indifferent; no; strongly no. 
Responses were set up for each item separately for males and females 
involving the five types of responses, thus making 2x5 contingency tables, 
and a chi-square was computed for each of the items on the differentiation 
of men from women. Those in which the males showed greater interest 
than the females were then tabled in descending order of chi-squares. 
A corresponding table was made for those in which the proportion of 
women declaring interest was greater. Daniel then set up a table of the 
distribution of these one hundred chi-squares and tested the departure of 
this distribution from that which would be yielded by random samples 
in harmony with the null hypothesis according to the mathematically 
known distribution of chi-square. This same procedure was used by 
several other authors. Altho there is some plausibility to this procedure, 
it does not appear to the reviewer as appropriate for getting an over-all 
determination (if an over-all is really needed) as additive chi-square 
would be; and certainly it is more laborious. 

Beery (5) had 953 respondents from six different type groups indicate 
agreement or disagreement on 276 propositions regarding the implica- 
tions of democracy, the items being divided into three random forms. 
Within each of these six groups he made ?*tests to investigate the hy- 
pothesis of a fifty-fifty agreement or rejection. Then he made an over-all 
test by adding these ¢*’s for a composite chi-square. This was a correct 
procedure. Beery appears to be one of the very few researchers who rec- 
ognizes that, for one degree of freedom when dealing with frequencies, 
chi is identical with ¢ as a deviate in a normal distribution when used 
for testing the same hypothesis. Other researchers reporting during the 
cycle (e.g. 82) put the test of proportions arising out of frequencies in 
terms of chi-square with one degree of freedom even when addressing 
readers who could not be expected to understand this statistic, apparently 
believing that there is greater exactness in the distribution of chi-square 
than in that of the critical ratio from proportions, or differences between 
proportions. In fact, when the standard error formula for differences 
between proportions is correctly stated and applied (with the p made 
the hypothetically true one or made as a pooled estimate from the sample) 
the standard error ratio from it interpreted from the normal curve table 
is identical with that from chi-square to the hundredth decimal place 
and beyond. Students of elementary statistics know that the distribution 
of proportions, and of differences between proportions, is not normal. 
particularly when p is small; and they employ the normal tables for 
interpreting the critical ratios from them with apologies. It is only 
because the distribution of chi as tabled assumes an infinite N (where 
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N is the total number of observations in the sample, in contrast with n 
which is based upon the number of cells into which the total N is grouped) 
that its curve is smooth instead of a polygon and that its distribution 
is normal regardless of the p; for all samples less than infinite in size 
the distribution of chi (and of chi-square) has exactly the same limita- 
tions as those of the critical ratio from proportions or differences between 
proportions. The two are algebraically identical. If researchers could get 
over the false sense of security arising from lack of information about the 
assumptions back of the chi-square distribution, they might wish to make 
less use of that statistic where more straight-forward alternatives, or 
alternatives with a more constructive meaning, are available, such as 
differences between proportions or tetrachoric correlation coefficients. 

Arsenian (2) and Lange (45) employed mean-square contingency cor- 
relation coefficients, using chi-square as their foundation. Among others 
employing the chi-square technic were Drummond (18), Everote (19), 
Hunnicutt (30), Katona (36), and Postman and Murphy (62). 

Many miscellaneous technics departing from the well-known conven- 
tional ones were employed. Some of these involved interpretations or 
transformations of the correlation coefficient so as to supplement its 
customary interpretation. LaGrone (43) supplemented r’s by showing 
the difference between means of a dependent factor when classified into 
uppermost and lowest quarters on the independent one. Strang (73) 
showed means and standard deviations of classes arranged in hierarchial 
order where customarily coefficients of correlation would have been 
employed. Many writers transformed correlation coefficients into their 
corresponding hyperbolic arc-tangents (z) before working with their 
reliabilities. Swensen (74) correlated difficulty values of one hundred 
arithmetic items after training with difficulty values before training in 
three different groups drilled by different methods, with the purpose of 
comparing the change between before and after r’s among the three groups. 
For this comparison she transformed the r’s into z’s and obtained the 
standard error of the difference by the conventional standard error 
formula for random groups—a necessary formula here because the cor- 
relation coefficient needed for the third term where arrays are matched 
(as they were here in each before-and-after) is not known in the case 
of z. DiMichael (16) working with a problem that required the reliability 
of differences between r’s, employed the correct formula involving the 
correlation between the r’s for matched arrays. DiMichael gained more 
precision by using the correct formula without transformation than 
Swensen did by the slight advantage from the transformation when made 
at the expense of the impossibility of using the correct formula for the 
standard error of the difference. Fahey (20) displayed both correlation 
ratios and correlation coefficients for numbers of questions asked by 
pupils in class and twenty-three factors such as age, IQ, grade in course, 


and reading comprehension, and found the eta’s differ markedly in many 
instances from the r’s. 
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Woodrow (87) made a multiple factor analysis of the intercorrelations 
among six school subjects in Grade V, VI, and VII in two cities, finding 
no general factor and group factors with only low communalities. Conant 
(10) used multiple factor analysis on reading tests. Bayle (4) investi- 
gated regressive eye movements in reading by studying the several pat- 
terns into which the behaviors seemed to fall. Knipp (39) made a survey 
of the methods and designs employed in experiments.on the teaching of 
arithmetic from 1911 to 1940. Cook (11) studied the stratification of a 
tenth-grade class by a type of sociometric technic of which educational 
researchers would do well to take cognizance. Selover (70) and Baten 
and Hatcher (3) used Fisher’s discriminant function. 

Several experimenters used a plurality of pairs of classes in their 
studies rather than merely pairs of individual subjects. Some of these 
give evidence on the inconsistency of the outcomes in the several schools. 
Anderson, Marcham, and Dunn (1) employed a “telling” method of in- 
struction versus a “doing” method in fourteen pairs of classes in eight 
different towns in seventh grade, and another twelve pairs of classes in 
Grade X. Of the eighty-four differences forty favored the telling method 
and forty-four the doing. No evidence was given of the significance of 
the differences as indicated by the schools separately, but the differences 
were in many cases substantial and probably would have separately 
suggested statistical significance. Stewart (72) also used in his study 
twenty pairs of classes in Illinois, Iowa, and Minnesota to investigate 
the effect of teaching diagraming upon certain other English masteries, 
using a three-part analysis of variance. None of the F’s was significant. 
No showing is made regarding the schools individually, but the fact that 
the MXS variance was considerable suggests that the schools must have 
differed greatly in the effectiveness of the two methods. Wrightstone (89) 
likewise reported the New York experiments on the activity program in 
terms of paired classes instead of paired individuals. Altho all but three 
of twenty-four differences favored the activity program, the ¢ ratios were, 
in two-thirds of the comparisons, low enough to suggest much inconsist- 
ency in the findings from school to school. The study of separate schools 
in a replicated experiment is a practice to be encouraged, and may turn 
out to have a marked bearing on the issue to be stated in our next 
paragraph. 


Studies with No Inferences about Population Values 


A very large number of studies are intended to meet an immediate 
local problem and make no effort to draw inferences about where the 
“true” values (parameters) in a parent population lie. Sometimes this 
occurs because the author is unsophisticated in statistics, or because he 
is writing for an unsophisticated audience. In many cases the studies are 
made and written up by persons of high standing in research who are 
quite able to employ refined statistical procedures if they wish (e.g., 
25, 77, 78, 86). Apparently, influenced by such evidences as those referred 


388 











December 1945 RESEARCH Mitops anp DesicNs 








——_ 


to in our preceding paragraph and by other considerations, they have 
no great faith in the practical validity of the statistics of inference (in 
contrast with its purely mathematical validity). There is emerging a 
theory of education into which a science of education that attempts 
to mandate “proven” superiorities in methods of teaching or in values 
does not fit well (58). This progressive education type of theory holds 
that it is the right of each school and of each teacher and her pupils 
to choose and to plan their values and their methods themselves, not 
have them mandated by others, even by a “science of education.” The 
only manner in which research can serve these schools is to offer the fact 
that certain other pupils have found certain values or certain methods good 
and to carry the suggestion that the pupils and teachers try them for 
themselves (58, Chapter VI). In such a setting the dynamics of the 
local situation are a more powerful factor than any average of success 
elsewhere and may be expected often to upset predictions based on other 
pupils. Where human beings cooperate in real groups the mathematical 
laws of probability in sampling never hold completely because these 
laws are predicated on independence among the elements; in a group, 
social beings tend to be drawn into a certain degree of solidarity, and 
even in the relatively unsocialized schools of the past this socializing 
dynamic doubtless often upset the theory of sampling. But as the demo- 
cratic movement in education sweeps onward, this dynamic socializing 
force within classroom groups may so completely overturn a theory of 
probability based on the assumption of independence among the indi- 
viduals as to give to the statistics of inference a very different and less 
important status than it has seemed to have previously, and a very dif- 
ferent status from that which it will continue to have in such fields 
as agriculture. For this new educational condition teachers can offer 
to one another descriptions of what they did and how it worked out for 
them; but to assert its probable goodness for an infinite population, and 
to offer it thus as an implied mandate in a science of education, will 
be far more presumptuous than it was in the days of the made-in-advance 
school, where pupils waited to be manipulated by their teachers and 
teachers waited to be manipulated by their supervisors. 

Another mark of the tendency to make educational research serve 
local needs rather than to build a pure science of education is revealed 
by the large number of doctors’ dissertations that construct local pro- 
grams by implementing in action theories of education. Of the 1364 
doctors’ dissertation topics reported to Good (24) during the three-year 
period, the titles of at least 142 indicate that they are of this nature. 
The immense number of well-controlled inductive studies which con- 
stitute the findings of a science of education have done relatively little 
to affect classroom practice thruout the country, at least directly; for 
they have lain on shelves unknown by the rank and file of teachers. 
The constructive projects of local application are likely to be put to 
use at least in the communities for which they were made. 
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CHAPTE® V 


Observational Methods of Research 


SAUL B. SELLS and ROBERT M. W. TRAVERS 


Tuis chapter reviews methods of research based upon survey technics, 
the use of questionnaires, interviews, ratings and rating scales, case studies, 
autobiography, direct observation, and instrumental recording. Material is 
drawn from several related fields which use these research instruments as 
well as from educational research proper. Transferability of method to 
problems in educational research has been the chief criterion in the selec- 
tion of references. 


Survey Technics 


General 


The basic aspects of research thru surveys include the instrument or 
vehicle of data collection, the method of collection; the definition and 
selection of the survey population or sample; the methods and technics 
of summarizing and analyzing data; and the textual, tabular, and graphic 
presentation of results. The large-scale data collection problems involved in 
the administration of important wartime government controls gave rise 
to the development of special skills and procedures by those who directed 
these projects. Three agencies published manuals on surveys and form 
design methods (72, 73, 96, 97). 

The Office of Price Administration published two manuals, described 
by Sells (79). The first (72) dealt with substantive issues in survey plan- 
ning and design, presenting criteria and rules governing the relationship 
of form and survey design to objectives, elimination of unnecessary items, 
simplification of respondent’s task and reduction of response time, prin- 
ciples for achieving simplicity of questions and improving respondent's 
understanding of his task, statistical planning (including statistical design, 
objectivity of questions, methods of data collection, sampling plan and 
tabulation plan), and administrative factors of cost, timing, utilization 
and application of results, and public relations. The other OPA manual 
(73) covers mechanical problems of form design, media of duplication, 
paper, type, special mechanical features, and form standardization. 

Deming (25) with Bureau of Census experience as a background 
analyzed the main factors affecting the accuracy and usefulness of surveys. 
His list of thirteen sources of error, while slightly repetitive, due to the 
nature of the material, has been found useful: 

1. ity in response 

2. s between different degrees and kinds of canvass 


rsus indirect interview 
versus extensive interview 





December 1945 OBSERVATIONAL METHODS OF RESEARCH 





c. Long versus short schedules 
d. Check block plan (checklist) versus (free) response 
. Bias and variation arising from the interviewer 
. Bias of the auspices 
. Imperfections in the design of the questionnaire and tabulation plans 


a. Omitting questions that would be illuminating to the interpretation of other 
questions 


b. Wrong wording, eliciting an answer liable to misinterpretation 
c. Forcing the respondent into a pattern 
d. Failing to perceive what tabulations would be most significant 
. Changes that take place in the universe before tabulations are available 
. Bias arising from nonresponse (including omissions) 
. Bias of late reports 
. Bias arising from an unrepresentative selection of date for the survey 
. Bias arising from an unrepresentative selection of respondents 
. Sampling errors and biases 
. Processing errors 
a. Coding 
b. Editing 
c. Machine and tally errors 
d. Posting and consolidating 
. Errors in interpretation 
a. Bias arising from bad curve fitting or adjusting 
b. Misunderstanding the questionnaire—failure to take account of the respond- 
ent’s difficulties (often through inadequate presentation of data), misunder- 
standing the method of collection and the nature of the data. 
. Personal bias in interpretation. 


Consumer Interviewing—Opinion and Market Research 
Blankenship (17), Cantril (19), and Gallup (35) have written books 


presenting comprehensive analyses of the problems and technics in market 
research and public opinion polling. 

Technical papers have dealt principally with factors affecting reliability 
and validity of data. Hilgard and Payne (48) compared the characteristics 
of persons found at home with those not at home when the interviewer 
called. They concluded that “people easily found at home on the first call 
differ significantly from those found at home only after repeated calls. 
The latter occur in large enough proportions to make it important for 
repeated calls to be made in order to represent them in sample surveys.” 
Lazarsfeld (62) outlined six main functions of the open-ended interview 
{as contrasted with the yes-no, multiple-choice, or checklist types). 
These deal with clarification of interviewer’s answer, singling out 
decisive aspects of his opinion, its relationships and motivation. More 
complete replies aid in interpreting statistical relationships. Gosnell and 
de Grazia (42) analyzed errors arising from polling interviews and 
means of reducing them. They cite as sources of error: interpersonal ten- 
sion caused by respondent’s sense of insecurity; economic, educational, 
racial, and nationalistic differences between interviewer and respondent; 
and excitement level, consideration time, and political party activities. 


395 








REVIEW OF EDUCATIONAL RESEARCH Vol. XV, No. 5 





Studies by Friedman (34), Katz (56), Stanton and Baker (83), and 
Udow (95) on interviewer bias indicated that the significance of this 
factor varies in different situations. Stanton and Baker, using nonsense 
geometric figures in a recall experiment, introduced interviewer bias 
experimentally by giving the interviewers the correct answer “key.” 
They found that the bias of the interviewer exerts some determining effect 
upon the outcome of the interview even when the interviewer is expe- 
rienced, the direction of the bias is known to him, and the material has 
no personal or emotional connotation. The effect of the bias was found 
to be more pronounced upon incompletely learned, or remembered, 
material. They assumed, pending further study, that minimal cues and 
errors in recording might account in part for the results. However, Fried- 
man, using a different procedure, failed to confirm Stanton’s and Baker’s 
findings. Katz obtained different poll results on labor and war issues when 
he used two sets of interviewers, one trained white-collar group and one 
experimental working-class group, on the same survey, working with 
identical instructions. He concluded that social status of the interviewers 
influenced the findings. Udow, in two market research and opinion sur- 
veys, found that neither the interviewers’ own opinions nor their knowl- 
edge of the sponsorship were significant variables in the results. 

Rugg and Cantril (78) found that “the extent to which the wording 
of questions affects the answers obtained depends almost entirely on 
the degree to which the respondent’s mental context is solidly structured.” 
People who lack reliable and consistent frames of reference “are highly 
suggestible to the implications of phrases, statements, innuendoes, or 
symbols of any kind that may serve as clues to help them make up 
their minds.” Questions which bluntly state some deviation from an 
established norm are less likely to receive favorable replies than questions 
which imply the same deviation but state it more by implication. Where 
a new and somewhat complicated problem is to be posed about which 
people have thought little, the free-answer type of question should be 
used. “The split-ballot technic should be used wherever possible to test 
stability and consistency of opinion by noting the effect of . . . variation 
between free and prescribed responses.” 

The strength of drives to win approval or to avoid social disapproval 
were found by Gordon and Davidoff (41) to cause serious dishonesty 
and hence unreliability of scores on adjustment questionnaires. 

Stonborough (86) described the advantages over other methods of 
market research of a continuous controlled sample of consumers who are 
motivated to keep a careful diary record of purchases. The consumer 
panel technic is valuable for many problems in educational and social 
research. 


Ratings and Rating Scales 


Teacher ratings—Barr and Harris (12) developed a teachers’ perform- 
ance record which provided a record of the observable behavior of teachers 
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and pupils and the data necessary for an evaluation of what is observed. 
It contains space for recording teacher and pupil activities, entries 
relative to their evaluation, and a scale for evaluating the personal fitness 
of the teacher. Baller (11) developed a case study instrument for evaluat- 
ing teachers’ understanding of child growth and development, entitled 
“The Case of Mickey Murphy.” It presents a child development situation 
which would reveal, in the interpretations and recommendations which 
teachers in training give it, something about their understanding of 
growth and development. Smalzried and Remmers (81) applied the 
Thurstone method of factor analysis to student ratings of faculty members 
on the Purdue Rating Scale. This analysis produced two factors, desig- 
nated Empathy and Professional Maturity. Empathy correlates highly 
with fairness in grading, personal appearance, sympathetic attitude toward 
students, and liberal and progressive attitude, while the greater factor 
loading for Professional Maturity is accounted for by self-reliance, confi- 
dence, and presentation of subjectmatter. Haggard (43) found that college 
freshmen ranked ability in teaching and organization of subjectmatter 
highest among characteristics of a desirable teacher, while appearance was 
ranked lowest. Freshmen, as compared with seniors, placed more em- 
phasis on characteristics related to human relationship of teaching. 
Henrikson (47) found that ratings of voice exercised a strong halo effect 
on teacher efficiency ratings made by practice teaching and public-school 
supervisors. 

Course ratings and evaluation of outcomes—Marzolf (66) had sixty-one 
statements, descriptive of possible outcomes of a teacher education course, 
rated as to desirability by 275 students and thirty-three faculty members 
of a state normal university. He reported median rating, rank, and Q 
for each item. Johnson (54) obtained replies from 12,425 graduates of 
Chicago high schools on a follow-up questionnaire designed to determine 
evaluations of the assistance of schooling in relations with people, in 
jobs and in subsequent education. The replies stressed “assistance in 
English and speech” and “training in vocations.” Dexter (26) published 
a third revision of a questionnaire, intended to be used as an objective, 
anonymous instrument for a student’s evaluation of a college course of 
study. It covers text, lectures, laboratory, field trips, quizzes, examina- 
tions, class discussions requirements, and general evaluation. 

Speech ratings—Thompson (89) conducted a series of experiments on 
devices for measuring public speeches. Using college student audiences he 
found that the paired-comparison method is superior to rank order, and 
that a linear scale is about as accurate as letter grades, a “descriptive 
letter scale,” the Bryan-Wilke scale, a Thurstone-type attitude scale. 
Because practice in rating has little effect and raters differ greatly in 
accuracy, individually and by groups, he concluded that further research 
should focus upon the raters rather than the methods. 

Home environment and socio-economic status scales—Lundberg and 
Friedman (65) scored 232 families in a rural Vermont township on the 
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Chapin, Guttman-Chapin, and Sewell scales. High intercorrelations were 
found. The report also discussed discrepancies between the scales. Sewell 
(9) reported a correlation of .94 between a new fourteen-item scale and 
h’ longer form. The reliability of the short form ranged from .81 to .87. 
¥ «r (58) reported the construction and statistical analysis of a home 
environment scale designed for group administration. Reliabilities from 
84 to .91 were found and a high degree of validity. Items are grouped 
statistically into four sections: cultural, aesthetic, economic, and miscel- 
laneous. Cantril (19) analyzed self-ratings of social and economic status, 
interviewers’ ratings, and reported income data of a representative cross 
section of the national population. His results indicated that the majority 
tend to identify themselves socially and economically with the middle class, 
that there is no one-to-one correspondence between social and economic 
identification, that the lower income groups tend most toward middle- 
class social identification, that there is a tendency to regard one’s social 
class as higher than one’s economic class, and that the disparity between 
social and economic identification increases up the social and down the 
income scale. Woofter (103) presented a technic for analysis of family 
composition and income, using as a yardstick the median per capita of 
the population. 

Behavior rating scales—Several new scales, inventories, and activities 
inventories have been reported: Cox and Anderson (23), Harris (46), 
Kopel (61), Mooney (67), and Smith (82). Tschechtelin (93) had 166 
children rate themselves and had six fellow pupils and four teachers rate 
them on both the Kelly 36-trait personality rating scale and the Tschech- 
telin 22-trait personality rating scale. The two scales had average corre- 
lations of .85 for teachers’ ratings, .76 for pupils’ ratings, and .64 for self- 
ratings. She concluded that this indicates that these child and adult scales 
were highly comparable and may, therefore, be used in systematic scienti- 
fic investigations of pupil and teacher personalities. Cox and Anderson 
(23) studied teachers’ responses to problem situations in a high school by 
obtaining teachers’ and students’ responses to a list of items selected from 
a mental-hygiene scale for teachers. They found that, both by teachers’ 
reports for themselves and by students’ reports of the teachers’ technics, 
the teachers in general either defeat their own purposes by making the 
problem worse or they use technics unrelated to the problem. 

Job ratings—Moore (68) criticized four types of job evaluation: job 
classification, job ranking, job elements, and point evaluation, which are 
most widely used. This paper outlines the principal steps in point evalua- 
tion together with an estimate of the validity of the technics. Stigers 
and Reed (85) outlined a complete system of job evaluation. This con- 
sists of three steps: analyzing the factors, measuring their strength or 
value in terms of points, and converting the point values into money 
values. A new element, called “accuracy of motion and/or position” has 
been added to the thirty-five presented earlier. After an element is 
identified in a job, a questionnaire is filled out to discover how it affects 
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the job. Based on this information, point values are determined by means 
of a rating scale or table. 

Employe service and efficiency ratings—There has been a definite recog- 
nition of the value of efficiency ratings for guidance as well as adminis- 
trative uses. This is recognized in the contributions of Halsey (44), 
Moore (69), Watkins (98), and Zerga (105). New contributions to 
method stress methods of development of rating forms, self-ratings, 
administration, interpretation of results, record keeping, and use in em- 
ploye-management relations. Steinmetz (84) and Fear and Jordan 
(29) have published new instruments. Zerga (105) reviewed the merit 
rating systems in use in a number of large industrial organizations. 
Halsey (44) and Watkins (98) reviewed merit systems used by govern- 
mental organizations. Tiffin and Musser (91) suggested the use of 
Z-scores to weigh merit rating items. 

Other research using rating technics—Sumner and Clark (88) found 
that fifty-two adult Negro judges were unable either before or after a stand- 
ardized individual interview effectively to rank seven Negro college 
freshmen as to estimated test intelligence. Kay (57) analy sed the effects 
of stereotypes and prestige suggestion on college students’ rankings of 
the prestige value of twelve occupations. Eysenck (28) applied the psy- 
chophysical method of direct comparisons to the measurement of aver- 
sions and satisfactions. Howard (49) analyzed the complexity of mental 
processes in science testing by having college professors and graduate 
students rate 180 items in the Cooperative General Science Test according 
to complexity (from mere memorization to complex integration of infor- 
mation). Thorndike (90) analyzed the ratings of 155 teachers of the 
extent of the contribution made by their studies and occupations to 
their general intellectual and character training and their interest in 
these activities. Abramson (1) studied the ratings by forty-nine high- 


school graduates of the formative influence upon vocational choice of 
twenty possible factors. 


Interviews 


Interviewing is an important technic of evaluation, guidance, data col- 
lection, and therapy. Principles of good technic are common to all of these 
applications. Several contributions to interviewing technic have appeared: 
Fearing and Fearing (30), Fenton (31), Garrett (36), Otis (74), Porter 
(75, 76), and Williams (102) described the basic instructions used 
by interviewers of the National Opinion Research Center. This covers 
selection of respondents’ approach, attitudes, types of answers, place of 
interview, and supplementary information. Edmiston (27) described the 
use of the group interview technic in appraising the professional program 
in New York State teachers colleges. Freeman (33) described the essen- 
tials of the “stress interview” in selection of employes. This technic con- 
tains five parts: nonstress questioning, nonstress action, stress question- 
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ing, stress action, and post-stress questioning. An examining board rates 
the applicant on a series of rating scales. 

Child (21) described the treatment of data obtained in a study of the 
reactions of individuals in an acculturating group as an illustration of 
several methodological points in the use of interviews. First, certain con- 
trols can be introduced to insure much objectivity in analysis of data. 
Second, interview data can be used for construction of quantitative scales 
comparable to those commonly derived from tests and questionnaires. 
Third, interviews afford a total evaluation of the individual subject which 
yields conclusions not as easily accessible to quantitative technics. 

King (60) reported the use of idea-centered questions in interview 
schedules. A free method of wording questions was used, while attempt- 
ing to state clearly the ideas and issues involved. Reliability, measured 
by returns of two interviewers, was satisfactory. 

Franz (32) analyzed the similarity of Moreno’s psychodrama technic 
to interviewing as a research aid. It has the advantage of reducing the 
possibility of concealment of facts and of allowing data to be gathered 
in a life-like situation. 

Strang (87) and Young (104) studied reading interests of school 
children thru personal interviews. 


Anecdotal Records, Case Studies, Autobiography, 
and Direct Observation 


Hamalainen (45) studied the effectiveness of anecdotal records of 
behavior in and out of the classroom as a basis for teacher appraisal 
of pupils. He compared teachers’ rankings of pupils on the basis of anec- 
dotes with ranks on several standardized tests of interests, achievement, 
social studies, and personality. He concluded that teachers are able 
substantially to judge pupil social relationships after using anecdotal 
records; the anecdotes revealed interests and interest changes not shown 
in the Hildreth inventory; the success of the anecdotal method is depend- 
ent upon the outlook and training of the teacher and the type of the 
educational program. Gaw (37), with reference to records for use by 
the dean of women, stressed the need for divorcing descriptive material 
from inferences and for using all available autobiographical material. 

Three studies discussed quantitative technics for treating qualitative 
data. Wherry (101) outlined a method whereby biographical or other 
qualitative data may be used to predict success or failure on an inde- 
pendent criterion. This is a least squares equation with a transformation 
equation for punch-card coding. Bittner (16) used the Wherry-Doolittle 
technic to predict college entrance from qualitative biographical ques- 
tionnaire responses. Jones (55) described methods for describing and 
summarizing socially significant factors in motion pictures. 

Ludeke and Inglis (64) developed a technic for validating interview 
data on portions of a magazine read by checking reports against records 
made by observers thru concealed one-way vision screens. 
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Arrington (8) published a comprehensive review of methodological 
and behavioral findings using the time sampling technic. This is a method 
of observing the behavior of individuals or groups under ordinary 
conditions of everyday life in which observations are made in a series 
of short time periods so distributed as to afford a representative sam- 
pling of the behavior under observation. Chin (22) studied conformity 
behavior by recording the time of arrival of college students at a nine 
o'clock class over several weeks. By means of a questionnaire he isolated 
some of the factors affecting the distributions obtained. 

Direct observation of behavior, while expensive in time and personnel, 
is nevertheless one of the richest sources of information. Bell (14) made 
observational records of ninety-three children aged three to eight 
during periods of dental treatment in clinics, recording the dentists’ 
behavior concurrently. The results, supplemented by interviews with the 
children and their parents, disclosed needs for training parents regarding 
children’s dental needs and behavior and for training dentists in the 
guidance of child patients. Studied by Appel (7), Arsenian (9), Baruch 
and Wilcox (13), and Bonte and Musgrove (18) of children of nursery- 
school and preschool ages, illustrate the value of observational methods 
in the study of aggressive behavior, personal and social adjustment, per- 
sonality development, and play activities. Biber and others (15) devel- 
oped a general summary of behavior characteristics of a seven-year-old 
group, using as sources recorded observations of work and play activities, 
spontaneous behavior, and expressed opinions, supplemented by objective 
and test data. Nesbitt (71) analyzed problems of student nursery-school 
teachers using observational records of their actual performance. Moor- 
head and Pond (70) reported the results of a five-and-a-half-year study 
of spontaneous music behavior of children aged two to six. They con- 
cluded that the “program-music” concept of children’s music (story 
telling and picture painting) is too narrow. 


The Use of Instruments and Machines 


In almost every field of human endeavor, machinery now accomplishes 
much of the work previously performed by human beings, and instru- 
ments have substituted accurate quantitative measurement for inaccurate 
evaluations based on subjective judgment. However, the process of edu- 
cation has been influenced only indirectly for the most part by the trend 
towards the greater use of machines. Apart from the occasional use of 
the movie projector or the radio, the most important use of instruments 
and mechanical devices in education is to facilitate measurement, and ° 
particularly the measurement of sensory thresholds and the measurement 
of abilities and aptitudes. 

The need for an accurate mechanical device for scoring objective tests 
has been recognized for nearly two decades. Amongst the precursors of 
the modern scoring machine is one developed by Pressey in 1932 (77). 
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This device like all other test scoring machines required the use of a 
separate answer sheet. The student was required to punch out the correct 
answer on the answer sheet. The correctly punched holes were picked up 
by the scoring machine thru catching in a pinion gear as it moved across 
the answer sheet. Another type of mechanical scoring machine was 
developed by Cuff (24), and also required the use of an answer sheet 
in which the answers were punched out by the examinee. In Cuff’s 
machine, scoring was accomplished by placing the answer sheet in a 
frame which was held rigidly over the platform of a scale. Weights 
were lowered over the positions in the answer sheets which corresponded 
to the correct answers. If an answer was correctly punched, one-fourth 
ounce weight passed thru the hole in the answer sheet and depressed the 
platform of the balance by a small amount. The total number of correct 
answers could be determined by reading the scale of the weighing 
machine which was graduated in one-fourth ounce units. 

The present type of test-scoring machine in common use developed 
by the International Business Machines Corporation is well known (51, 
52). However, it should be noted that this machine has an advantage 
over previous devices since it can be used for making rapid item analyses 
and can be used for a number of other varied purposes. 

Lorge (63) has reviewed the applications of the International Business 
Machines to educational research. In this number of the Review, he has 
reviewed some further applications of the tabulator, sorter, multiplier, 
test-scoring machine, and the graphic item counter (Chapter 5). A useful 
bibliography of recent applications of I. B. M. equipment has been pub- 
lished by I. B. M. (50). A recent adaptation of I. B. M. punched-card 
equipment for recording responses in a multiple choice situation was 
made by Gaylord (38) who adapted a numerical hand punch so that 
subjects recorded their responses by punching holes in an I. B. M. card. 

Much mechanical ingenuity has been devoted to the development of 
instruments for diagnosing reading difficulties and for providing remedial 
treatment. The Ophthalmograph (3, 4, 5, 6) designed by the American 
Optical Company is in essence a camera adapted for photographing eye 
movements during reading. The Metronoscope (5, 6) produced by the 
same company is a device for training readers by pacing eye movements. 
preventing regressions, and establishing rythmical left-to-right movements 
of the eyes. A simpler portable instrument, the Junior Metronoscope, (2) 
also includes the necessary optical mechanism for corrective work in 
connection with inadequate oculo-motor coordination and fusional diff- 
culties. While these devices are mechanically well designed, there is as 
yet no indisputable evidence that the results achieved with them are 
greatly superior to those achieved without them. Traxler (92) surveyed 
the literature on controlled reading and concluded that the results of 
research do not provide clear-cut evidence, favorable or unfavorable, to 
controlled reading. However, Traxler added that the evidence tends to be 
on the favorable side toward the use of instruments of the type described. 
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The traditional device for measuring visual acuity, the Snellen chart, 
is now recognized as having limited value. Two new devices are now 
available for this purpose which overcome many of the weaknesses inher- 
ent in the old letter-reading test and provide tests for a much wider range 
of visual functions. One of these, the Orthorater (53) provides measures 
of visual acuity for both distant and near vision and in addition enables 
the examiner to test for phorias, depth perception, and color perception. 
Another instrument of modern design for measuring visual acuity is the 
Telebinocular (59) which provides measures of visual acuity both for 
near and far vision. The telebinocular test series is given with both 
eyes open and it is possible to determine whether the subject is suppress- 
ing or blocking vision in one eye. It is also possible to study with this 
instrument problems of fusion and lateral imbalance. Both the Orthorater 
and the Telebinocular can be used satisfactorily with illiterates. 

The modern type of audiometer is slowly replacing the discredited 
whisper test. At the present time the Western Electric Company produces 
two types of audiometers, the four-type audiometer and the six-type audi- 
ometer. The most recent model of the four-type is the 4C audiometer 
(100) which is a phonograph to which forty earphones may be attached. 
Each subject applies an earphone to the ear to be tested and records on 
a piece of paper the digits which he hears. Phonograph discs with two- 
number digits recorded on them are used for testing the hearing of 
children below the fifth grade, and discs with three-digit numbers are 
used for testing older children and adults. If a more complete diagnosis 
of hearing losses is required, then the 6B audiometer (99) may be used. 
This latter instrument permits the testing of auditory acuity at any desired 
frequency from 128 to 9747 cycles per second. The 6B instrument also 
enables the examiner to test bone-conduction losses in each ear separately. 

In the field of speech training, some use has been made of voice record- 
ing instruments as training devices. However, relatively few systematic 
researches into speech problems have made use of such speech recordings. 
Gilkinson (39) reviewed 354 studies in speech and noted only two studies 
in which such records were used. A notable example of the use of speech 
records in research is provided in a study by Goldstein (40) who 
investigated the relation between speed of speech and comprehension. 
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CHAPTER VI 


Tests and Measurement 
J. RAYMOND GERBERICH 


‘ 

Convensation into one chapter of material equivalent to that covered 
by Greene, Jorgensen, and Gerberich (56) in the December 1938 issue of 
this Review has necessitated arbitrary decisions concerning types of mate- 
rial to be reviewed. This report, therefore, is limited to literature, concerned 
mainly with the measurement of relatively tangible instructional out- 
comes and the interpretation and guidance uses of results from such 
measurement. In a recent issue of this Review (45), Cornell (22) and 
Traxler (115) reviewed the construction of, and Freeman (52) and 
Darley (28), the application of results from, respectively, tests of intel- 
ligence and measurements of personality and character, and Sells (100) 
dealt with the measurement and prediction of abilities. 


General Textbooks and Reference Sources 


Greene, Jorgensen, and Gerberich (55, 56) wrote complete revisions 
of the two general texts for the elementary school and the secondary school 
which appeared in the middle thirties under the authorship of Greene 
and Jorgensen. Remmers and Gage (93) brought out a book on measure- 
ment and evaluation, and Darley (29) wrote on testing and counseling 
in the guidance program of the high school. Brereton (13) furnished 
general and historical backgrounds and proposed reforms for the examina- 
tion system in English schools. 

Greene and Crawford (54) revised the Greene workbook in educational 
measurement and evaluation. McKown (80) wrote for the benefit of 
students on how to pass a written examination. Swineford and Holzinger 
(109, 110, 111) continued their annual reviews of periodical literature 
on the theory of test construction. 


Problems Involved in Educational Measurement 


Scates (98) outlined five major respects in which scientists and meas- 
urement specialists differ markedly from the classroom teacher in their 
criteria of measurement. He generalized that measurement specialists, 
thru their primary interest in details, specifics, and formalities, have 
largely failed in standardized tests to attain measurement of the totality 
of behavior with which the teacher is directly concerned. The five major 
differences listed and discussed are: 

1, The demand for rigor; the scientist seeks truth and broad generalizations, while 
the teacher seeks information of direct, practical value. 


2. The approach to complexity; the scientist is interested in elements, whereas the 
teacher is interested in functioning organisms. 
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3. The attitude toward immediacy; the measurement specialist cannot measure con- 
tinuously, but the teacher needs to and must measure continuously. 


4. The concept of human development; the scientist measures traits uniform thruout 
their range, but the teacher measures growth in stages. 


5. The attitude toward vital aspects of learning; the measurement specialist generally 
measures formal abilities by cross-sectional power tests, but the teacher must be con- 
cerned with behavioral dynamics and abilities in life situations. 


Bloom (11) and Traxler (118) discussed major problems encountered 
by the educational test worker. Sims (102) pointed out some of the 
ways in which educational measurements could contribute more effec- 
tively to the broad evaluation of pupil behavior which is so important 
in the modern school. Pitfalls in the use of tests were pointed out by 
Kirkendall (70), and Dunkel (36) discussed misconceptions concerning 
measurement and evaluation. 

Saucier (97) critically analyzed statements made about and defenses 


given by test specialists for objective tests, and MacNeill (82) defended 
the essay examination for achievement testing. 


Trends in Educational Measurement 


Trends and tendencies in measurement continue to develop slowly but 
unmistakably in the direction of more functional and less formal tests 
and more intelligent use of measurement results. Monroe (83) pointed 


out that altho the “battle” for objective tests in educational measurement 
was won before 1920, growth of the measurement movement from 1920 
to 1945 represented progress from early adolencence to early adulthood. 
A symposium of the Committee on Measurement and Guidance of the 
American Council on Education (112) pointed up many uses of test 
results far beyond those conceived a decade ago. 

Evidence concerning direct effects of World War II appeared in a dis- 
cussion by Segel (99) of six major and three minor trends in testing 
and in a survey (38) of inquiries concerning testing and guidance serv- 


ices received by the Occupational Information and Guidance Service, 
Federal Security Agency. 


Construction of Instruments for Measuring Achievement 


Attention to the construction of tests measuring functional behavior 
rather than subjectmatter knowledges and understandings has doubtless 
been stimulated of late by Smith and others’ (103) report of testing 
procedures in the Eight-Year Evaluation Study. At least two standardized 
tests (50, 133) have dealt with behavior in this broad sense. 

Fordyce (49) and Yauch (144) presented suggestions concerning the 
construction of functional rather than formal tests by teachers and by com- 
mittees. Engelhart (40, 41) illustrated and discussed several unique test 
exercises. Raths (92) reported on the development of a test of thinking 
by teachers, and Roody (95) discussed a plot-completion test of signifi- 
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cance in personality measurement. An experiment with pupil-made test 
items in seventh-grade history was conducted by Early (37). 

At a somewhat more technical level, Smith (104) traced the steps in 
constructing and validating a general information test for preschool age 
children; Anderson (5) developed a technic based on that of the Stanford- 
Binet for achievement testing in psychology; and Pace (86) discussed 
and gave examples from a test designed to relate educational theory and 
practice. Kaulfers (69) based his technic for measuring oral fluency 
in modern foreign languages on the experiences of the armed services in 


World War II. 


Validity and Reliability of Achievement Measures 


Statistical aspects of test construction with major concern for test and 
item validity and test reliability were treated by Fattu (44) and Conrad 
(20) in recent issues of this Review. Articles of both general and technical 
nature now frequently use analysis of variance or factor analysis technics 
of research. 

Test validation—Davis (32) discussed the relationships of achievement 
testing methods to course objectives and instructional emphases. Richard- 
son (94) developed a simple procedure for determining test validity in 
terms of the increased efficiency of a selected group of personnel. Toops 
(114) used a “success profile” for validating a test, in determining scor- 
ing formulas and in weighing test parts, and presented a case for 
selecting the criteria before a test was administered as a substitute for the 
common self-validation methods. Sims (102) mentioned the possibility 
of test validation on the basis of retention and significance for future 
growth of the test content. A three by three chart of relationships between 
aptitude scores and course marks was developed and illustrated by 
Krathwohl (71). 

Item validation, difficulty, and scoring—Effects of item difficulty and 
chance successes on item and between-test correlations were studied by 
Carroll (16), who also developed a two by two table for relating data 
on item difficulty and chance successes. McNamara and Weitzman (81) 
investigated the difficulty of multiple-choice items in terms of the position 
of the correct response, and Grossnickle (57) studied the effects of exam- 
ple difficulties and arrangement upon scores in a test on division of 
decimals. 

Copeland and Gilliland (21) compared reliabilities and validities of 
four basic objective item forms for the measurement of achievement in 
child psychology. Different forms of spelling tests were compared by 
Brody (15). Curtis, Darling, and Sherman (27) and Wright (143) studied 
a modified form of true-false item in science instruction. Methods of 
scoring rearrangement tests were reported by Rosander (96) and Odell 
(85). Casanova summarized literature on measurement of randomness 
in order of correct-response position (17) and developed formulas for 
using the method of runs in testing randomness of order (18) in objective 
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test items. Dickenson (35) presented a method for detecting cheating 
on tests having a definite number of responses per item by means of 
identical errors greater than would be accounted for by pure chance. 
Test reliability—Cronbach (26) critically examined the split-half, 
rational equivalence or Kuder-Richardson “Footrule,” and parallel-split 
methods of estimating test reliability. Jackson (65) determined relation- 
ships between estimates of reliability obtained by the internal-consistency 
and test-retest methods thru use of an analysis of variance technic. Lord 
(79) examined the influence of the number of alternatives per item 
upon the reliability of multiple-choice tests. An approximation method of 
factor analysis was applied to test items in the estimation of test reliability 


by Wherry and Gaylord (136). 


Factors Affecting Test Scores and Test Performance 


Influence is often exerted upon pupil test performance by such constant, 
often unrecognized factors as may occur in the test or in its motivation and 
by variable, often well-hidden factors reacting upon the individual child. 

Tyler and Chalmers (125) studied the effect upon scores of advance 
warnings of tests to junior high-school pupils in general science. Plowman 
and Stroud (88) investigated the learning which resulted when pupils were 
informed concerning the correctness of their responses to objective test 
items. 

The psychiatrist’s approach was used by Liss (78) in studying pupil 
anxiety in examinations. Hastings (58) used a questionnaire in studying 
examination tensions and their relationship to scores on an achievement 
test. Waite (132) employed a laboratory method in determining the 
emotional responses occurring during the existence of a situation com- 
parable to that of an examination. Test performance in relation to socio- 
economic levels and persistence in the examination situation were studied 
by Fleming (48) and Briggs and Johnson (14) respectively. Pritchard 
(91) reported on the effect of ability in such motor skills as pencil 
manipulation on rate scores in arithmetic. 


Evaluation of Testing Technics and Standardized Tests 


Traxler (119) discussed seven problems encountered by the test maker 
in the field of reading, while Davis (31) outlined eight groups of skills 
desirable for measurement in a reading test. Wilking (138) suggested the 
use of the Roget Thesaurus classification of words into twenty-four cate- 
gories distinguished by philosophical criteria as a basis for selecting 
vocabulary in reading tests and checked the Iowa Silent Reading Test 
vocabulary against this criterion. Cronbach (25) surveyed methods of 
vocabulary testing under five ability headings and considered the relative 
merits of different item forms for testing each type of ability. 

The validity of tests in beginning reading was studied by Stone (107) 
in terms of their vocabulary load. Poston and Patrick (89) evaluated 
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word recognition tests using word and picture matching for pupils in 
the primary grades. Neill (84) analyzed the Latin-American content of 
standardized tests in the social studies and several other subject areas. 
Hendricks (59) and Woods and Martin (141) analyzed tests and testing 
practices in the areas of college chemistry and musical education 
respectively. 

Tinker (113) and Blommers and Lindquist (10) investigated relation- 
ships between speed or rate of reading and comprehension in achievement 
test scores at the elementary- and secondary-school levels, and Barnes and 
Mouser (8) compared scores of high-school and university freshmen on 
a test of biological misconceptions. Learned (74) reported an extensive 
study in which he analyzed the cases of college seniors for whom a great 
discrepency existed between course marks and scores on the Graduate 
Record Examination. 


Interpretation and Use of Measurement Results 


Jackson and Ferguson (66) pointed out the values of score distributions 
of U-shaped, J-shaped, bimodal, platykurtic, and leptokurtic types for 
serving certain specialized purposes in the interpretation of results. They 
recognized that statistical difficulties would arise in analyzing the results 
of such distributions, inasmuch as sampling error theory is largely based 
on the normal distribution. 

Cornell (23) developed a procedure for obtaining age progress per- 
centile norms by relating achievement levels to the ages of elementary- 
school pupils. Stevason (106) developed and illustrated a graphic method 
of converting test scores to marks on a five-point scale by methods based 
on the quartiles and on the standard deviation. 

Among articles reporting general use of test results were those of Jones 
(68) and Darley (30) in personnel work of the high school and college 
respectively and that of Jacobson (67) by accrediting agencies. Traxler 
(121, 122) dealt broadly with the use of test results in diagnosis in the 
tool subjects and in the appraisal of personality. Lindquist (76) wrote 
on the interpretation and use of results from the Iowa Tests of Educa- 
tional Development by the teacher. Traxler (117) wrote on individual 
evaluation. 

Sells (101) and Strang (108), respectively, discussed the use of 
educational and psychological test results and of data on reading ability, 
habits, and interests in cumulative pupil records. Ewing (42) reported 
findings concerning the use of standardized reading tests by teachers 
colleges, and Triggs (124) reported on diagnostic test results as basic to 
the correction of spelling deficiencies of college students. 


Credit by Examination before World War II 


College credits have been awarded upon the basis of satisfactory per- 
formance on comprehensive achievement examinations in some institu- 
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tions for as long as fifteen years, and the movement toward the awarding 
of credits for demonstrated masteries acquired informally rather than 
solely upon the basis of time serving in classes has gained impetus during 
the last few years. Pressey (90) justified this method of earning credits 
and recommended that the plan be accepted as a major academic 
procedure. 


Credit by Examination and In-Service Course Tests 
of World War Il 


Not long after the entry of the United States into World War II the 
American Council on Education, wishing to forestall the type of unrea- 
sonable and harmful awarding of “blanket” credit which followed World 
War I, called a special conference to develop policies and procedures for 
evaluating educational proficiencies developed by men and women in the 
armed forces. A recommended program (4), formulated by the Council 
working in cooperation with the armed forces, outlined the types of 
experiences gained in the armed services, indicated the desirable types 
of examinations for evaluating learning outcomes and suggested a course 
of action appropriate for American secondary schools and colleges. 
Tyler (127, 128, 129) and Lindquist (77) furnished progress reports 
for the program in general, and a committee representing the secondary 
schools (19) and a symposium for the colleges (130) interpreted its 
implications for institutions at these two levels. 

In an expandible publication at present in loose-leaf form (2) the 
American Council on Education provided the framework for the opera- 
tion of a program including not only accreditation of informally acquired 
proficiencies by examination but also for acceptance of credit for the 
formal learning experiences gained in the service training programs of 
the various armed service branches, the specialized training programs 
(such as the ASTP and Navy V-12) conducted by contracting schools 
and colleges for the armed services, and the correspondence and self- 
teaching courses offered for off-duty time by the Armed Forces Institute 
and other educational agencies. The informal learning experiences recom- 
mended for accreditation by examination were classified as of three types: 
(a) direct observation and experience in countries visited, (b) experiences 
incidental to military services thru on-the-job fulfilment of duties after 
the completion of formal training, and (c) self-directed study and self- 
education thru reading, educational movies and lectures, and organized 
discussions. 

The Armed Forces Institute provided three basic types of tests (1) 
the end-of-course tests for use with correspondence, self-teaching, and 
group-instruction courses, and the two types of tests—general educational 
development and subject or field—designed for measuring learning out- 
comes from informal experiences, plus a specialized series of tests in 
electronics (34) for measuring outcomes of highly technical training pro- 
grams. The nine tests of educational development and the more than 
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seventy subject tests are available in two forms; the A form is reserved 
for the use of the Armed Forces Institute and of institutional examiners 
approved by the AFI, while the B form is commercially available (43) 
for use in establishing local norms or accrediting bases or for use as 
regular course examinations. Five tests for the secondary-school level 
of general educational development cover the areas of English, literature, 
natural sciences, social studies, and mathematics, and more than thirty 
tests in seven subject areas. Those for the college level include higher. 
level tests of general educational development in the first four areas listed 
for the secondary school and some forty subject tests in nine subject areas. 

Descriptions of the various AFI tests of general educational develop. 
ment and subject or field series were provided by the American Council 
on Education (2). Recommendations were also provided concerning 
critical scores to be used in awarding or denying credit by institutions 
to which veterans apply, altho it is made clear that the accrediting insti- 
tutions are free to set their own critical scores. Detchen (34) and 
Lindquist (77) reported on the standardization testing by means of which 
most of the recommended critical scores were established. 

Reports on various aspects of some of these tests were made by Ashford 
(7) and Hered and Thelen (60) for those in chemistry, and by White 
and Enochs (137) for those in the reading and interpretation of litera- 
ture. Crawford and Burnham (24) reported on the use of the general 
educational development battery with 135 civilian university students. 


Selection, Classification, and Post-Service Testing 
of World War II 


Selection and classification programs of various armed service branches 
have been treated rather extensively in the literature. Stalnaker and others 
(105) and Anderson and others (6) surveyed this aspect of testing in 
recent issues of this Review, and Davis (33) treated such testing in the 
Army and Navy. Weitzman and Bedell (134) added to the less extensive 
reports of in-service achievement testing, and Williamson (139) wrote 
on the use of tests in the vocational and educational guidance of ex-service 
personnel. 


Coordinated Testing Programs: Statewide, 
Regional, and National 


Statewide programs—Coordinated testing programs on a statewide basis 
continue to serve a variety of purposes in widely different types of schools. 
Peterson (87) found that twenty-eight of the states have had at one 
time and nineteen of the states now have coordinated statewide testing 
programs. All pupils at the grade levels served participate in only eight 
of the nineteen states. A major or supplementary purpose in thirteen 
of the nineteen states is improvement in articulation between the high 
schools and the colleges. Commercially published tests are used in twelve 
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of the nineteen states and colleges or universities coordinate the programs 
in ten of the nineteen states. 

Findley (46) reported upon the guiding principles in construction 
and the validity for predicting general academic achievement in New York 
colleges of the Comprehensive Examination for Scholarship Awards. 
Wood (140) and Woody and Gatien (142) described programs operat- 
ing in Ohio and Michigan, respectively. Beers (9) outlined procedures 
used in administering, scoring, and reporting results for the university 
system of Georgia, and a similar report was made for the statewide 
program in Illinois (64). 

Lindquist (75) listed several advantages of the statewide or regional 
program, using the Fall Testing Program for Iowa High Schools (131) 
as the primary bases for his interpretations. He expressed concern that 
not more states have instituted and carried thru such programs. 

Regional programs—Testing services on a regional basis have so far 
been provided largely if not entirely in connection with the effective 
prosecution of World War II. The administration of the Army A-12 and 
Navy V-12 Qualifying Test for Civilians thru regional directors under 
one director for the nation was mentioned by Lindquist (75) as indicating 
the practicability of establishing regional programs under a central agency. 
He recommended that such programs provide flexibility thru the choice 
of any one of several “core” programs and thru supplementation by 
tests chosen or constructed locally. 

National programs—Testing programs on a nationwide basis have so 
far been limited mainly to services for special groups of schools or to 
a relatively small number of cooperating schools. Hill (61) discussed 
possible modifications in the National Teacher Examinations. Learned 
traced the development of the Graduate Record Examination (73) and 
pointed out its uses in the educational placement of returning veterans 
(72). Traxler (116) reported on the Educational Records Bureau pro- 
gram. Other programs of nationwide scope are the medical aptitude 
tests of the American Association of Medical Colleges, the College 
Entrance Board Examinations, the National Freshman Placement Test- 
ing Program, the National College Sophomore Testing Program, and the 


College Chemistry Testing Program, all embodying one or both of the 
limitations mentioned above. 


New York Times Test 


No account of tests and measurements for the last three years would 
be complete unless the Times test received attention. The charge that 
“college freshmen thruout the nation revealed a striking ignorance of 
even the most elementary aspects of American history, and know almost 
nothing about many important phases of this country’s growth and 
development,” was made by Fine (47), who concluded that the secondary 
schools had failed signally in their responsibility for teaching American 
history. His evidence was obtained from the scores made by 7000 fresh- 
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man students in thirty-six colleges on a questionnaire prepared by Hugh 
Russell Fraser and Allan Nevins. 


Congressmen, public officials, the press, and the public reacted promptly 
both in defense and in support of the attack. Educators who found the 
test weak or the interpretation of results faulty included Boyd (12), 
Elicker (39), Hunt (62), Traxler (120), and Tyler (126). Charges made 
by these educators dealt with biased or faulty motivation, administration, 
and scoring of the instrument, with ambiguities in questions, and with 
the factual nature and poor selection of content. 


The public controversy was apparently concluded by a defense of the 
test and an attack upon its critics by Fraser (51), a pro and con treatment 
of the original interpretation of findings respectively by Hunt and Fine 
(63), and a report of the Committee on American History in Schools 
and Colleges of the American Historical Association, the Mississippi Val- 
ley Historical Association, and the National Council for the Social Studies 
(135). Altho this committee was appointed before the Times test was 
given, its report presented and interpreted results from a carefully con- 
structed objective test on understanding of United States history. The 
committee pointed out that the findings did not support the conclusion of 
meager or ineffective instruction in high-school American history. 
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CHAPTER VII 


Statistical Theory: Some Recent Developments 
PAUL BLOMMERS 


Tue FIRST part of this chapter is devoted to a review of some recent 
developments in statistical theory. The review covers only the periodical 
literature which appeared from January 1943 to the present time. It has 
been divided into four somewhat overlapping subsections which, in order 
of presentation, treat questions of prediction, estimation and description, 
statistical inference in the nonparametric case, and other miscellaneous 
problems of statistical inference. An effort was made in the case of the 
first three subsections to provide sufficient detail to give the reader a 
general notion of certain recent developments without his having to refer 
to original sources. The relatively large number of articles covered in 
the last subsection made such a treatment impossible. The last subsec- 
tion, therefore, reduces to little more than annotated bibliography. 

Considerable difficulty was experienced in the selection of material 
which it was thought might be useful to educational research workers. 
Some of the theory reviewed has not yet been applied to problems of 
educational research and perhaps never can be. On the other hand, some 
of the theory not reviewed may in time prove to be of the greatest 
importance. 

Finally, it should be noted that many phases of statistical theory are 
rapidly becoming highly abstract. Toward such phases of the theory the 
attitude of the practical statistician should be one of tolerance, for it 
is never known when the theory may provide new tools which are not 
only more widely applicable but also simpler to use. 


The Prediction Problem 


The needs of the armed forces have, during the past few years, forced 
new attention on the old problem of classifying individuals on the basis 
of test scores plus past experience. The problem may be summarized 
as follows: Given continuous scores on test / thru ¢ for each of N indi- 
viduals selected at random from a specified population, and also a con- 
tinuous measure of the amount of some trait which past experience has 
shown each of these N individuals to possess; let it be required to estab- 
lish a scheme for classifying other individuals selected from this popula- 
tion with reference to the trait involved when given only their scores 
on the ¢ tests. The classical solution of this problem is, of course, provided 
by multiple regression. 

A significant shortcoming of the multiple regression solution lies in the 
involved and laborious computational processes accompanying the deter- 
mination of the multiple regression weights. A number of schemes for 
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approximating these weights have been suggested by Jackson (26). Using 
actual data, Jackson compared the effectiveness of each scheme, as meas- 
ured by the correlation between the resulting estimated trait classification 
(usually called the predicted trait score) and the actual trait classification 
as based on past experience (usually called the criterion score), with the 
effectiveness of the classical least squares determination of these weights 
as measured by the multiple correlation coefficient. On the basis of this 
empirical evidence he concluded that no one of the suggested approximate 
methods was best under all conditions and selected from among these 
methods three which seemed most promising. His recommendation was 
that each of these methods be given a preliminary trial and that one of 
the three be selected which results in the highest correlation between 
predicted and criterion measures. 

The least laborious of the three approximate methods recommended by 
Jackson requires the covariance between the scores on each test and the 
scores on the criterion, and the variance of the scores on each test. How- 
ever, these same data represent the major part of the information required 
by the other two suggested methods, so that the joint trial of the three 
methods is not as laborious as might first be supposed. Since the multiple 
regression solution requires the complete matrix of covariances, it appears 
on paper, at least, that Jackson’s suggestions accomplish the purpose in- 
tended. 

Another important shortcoming of the classical multiple regression 
solution lies in the fact that very frequently no continuous measure of the 
trait (criterion) concerned is available thru past experience. In fact it 
is frequently impossible to obtain such a measure because of the nature of 
the trait involved. A problem arises, then, differing from the one pre- 
viously stated only in the nature and extent of the information available 
thru past experience. 

Some years ago Fisher * devised a procedure for estimating test weights 
in such a way that a linear combination of the weighted scores (called the 
discriminant function) would provide a maximum discrimination between 
two groups of individuals with reference to some trait. The only informa- 
tion needed in addition to the test scores was a knowledge, based on past 
experience, of the group to which each individual belonged—that is, did 
he or did he not possess the trait, or was he or was he not successful in 
accomplishing a given task. Because of the possible applicability of the 
discriminant function in psychological and educational work, Garrett 
(18) has provided a simple discussion of the theory underlying it. An 
important feature of the method is the provision of a test of the effec- 
tiveness with which the obtained discriminant function classifies the 
individuals. Garrett’s presentation helped to clarify the procedure by 
showing its relationship to the multiple regression solution. By assigning 
scores of zero and one to the two classes and applying classical multiple 





1 Fisher, R. A. “The Statistical Utilization of Multiple Measurements,” Annals of Eugenics, 1938. 
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regression procedures, he obtained a solution identical to that yielded 
by Fisher’s method. When this was done the test of the significance of 
the multiple correlation coefficient was identical with Fisher’s analysis of 
variance test of the significance of the discrimination provided by his 
solution. It should be noted that the method may be extended to situa- 
tions in which past experience provides a classification of the individuals 
into more than two groups. 

This same problem has been solved by Wald (51). Wald’s approach 
required the same information as is required for use of the discriminant 
function, viz., measures on each of ¢ tests for N, individuals drawn at 
random from one population, say category A (possessors or passers), and 
measures on the same ¢ tests for N. individuals drawn at random from a 
second population, say category B (nonpossessors or failers). Wald de- 
rived a statistic (incidentally it may be noted that this statistic is propor- 
tional to the discriminant function) which may be used to test the 
hypothesis that a single individual drawn at random from population C, 
and for whom scores on the same ¢ tests are available, is not a member of 
population A, it being known a priori that population C is identical with 
either population A or B. For large values of N, and N, the distribution 
of Wald’s statistic is normal with a calculable mean and variance. Wald 
also provided the exact sampling distribution of his statistic for small 
values of N, and N,,. 

An interesting aspect of Wald’s approach arises from the fact that there 
exists only a single allowable alternative hypothesis. This makes it pos- 
sible to set up a critical region (i.e., a region containing values of the 
statistic which would occur a set proportion of the time if the hypothesis 
is true) so as to take into account both types of error. Suppose, for example, 
that W, and W, are two positive numbers expressing respectively the 
importance of an error of the first type (rejection of the hypothesis when 
it is true) and of the second type (accepting the hypothesis when the 
alternative is true). These values can, of course, only be established with 
reference to the purpose for which the classification is to be made. Wald 
described a procedure for determining the size of the critical region for 
any weightings (W, and W.) of the two risks. The solution for the case 
W, equals W, is given specifically. 

It is not uncommon in educational research for the predictive variables 
(called test scores in the foregoing discussion) as well as the criterion 
variable to be fundamentally qualitative in nature. The resulting predic- 
tion problem may be summarized as follows: Let N individuals, selected 
at random from a specified population, be distributed among a set of n,; 
purely qualitative categories, and let the individuals in each of these 
categories be classified on the basis of past experience into two groups 
(e.g., passers and failers) with reference to some trait. Then let it be 
required to assign weights to each of the n, categories in such a way that 
the relationship between the weighted categories and the criterion trait 
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will be a maximum. This accomplished, the procedure can be applied to 
other sets of n, categories, and each set of categories regarded as a test 
in a multiple regression problem. 

Wherry (54) provided a least squares solution to the problem of assign- 
ing weights to the categor:<s of a given set so that the resulting biserial or 
point-biserial correlation will be a maximum. The result—weight of 
category must be proportional to the percentage of “passers” in the 
category—is extremely simple, and is one which has been employed in 
the past, but without the confidence that accompanies the use of a 
technic for which a sound theoretical basis has been established. 

Consider now the case in which a given individual may belong to more 
than one of the n purely qualitative categories. Since an individual either 
does or does not belong to each of the n categories there are in all 2° 
unique classes into which the individuals may be classified. Johnson (30) 
outlined an ingenuous time saving procedure for effecting the classifica- 
tion of the individuals into the 2” classes. The members of each of the 
2" unique classes are then classified with reference to the criterion trait. 
A statistic is suggested which may be used as a basis for selecting from 
among the 2" classes those for which the criterion classification varies 
significantly from what would be expected under the hypothesis that 
the members of the unique classes are equally distributed with reference 
to the criterion trait. Membership in the classes thus selected becomes the 
basis for prediction, and the use. of a contingency table is suggested as 
a basis for analyzing the efficacy of the procedure. 

In this connection it may be noted that Johnson (28, 29) has devised 
a coefficient of selectivity, which is simply the relative gain in the number 
properly classified with reference to the criterion trait as a result of 
applying the scheme described. He has also devised a coefficient of 
correctivity which is the proportion of misclassified individuals properly 
reclassified by the scheme. Both coefficients appear to have a wider applica- 
tion. The relation of the latter coefficient to the fourfold Pearson r is 
discussed in a manner which should help to clarify the interpretation of 
both coefficients. It is well known that the fourfold r can assume its 
maximal value, unity, only when the proportion succeeding on the cri- 
terion variable (P,) equals the proportion succeeding on the predictive 
variable (P.). Since the values of P, and P, are often arbitrarily set on 
grounds quite apart from the matter of prediction, Johnson suggested 
that the predictive efficiency of an application of his scheme be judged 
on the basis of the maximum possible efficiency attainable for the specified 
values of P, and P,.? 

2 This discussion appears particularly pertinent since much use has been made recently of the fourfold 
Pearson r in the development of test theory. By this use certain writers have been led to conclude that 
all items of a test should be made equally difficult. Given perfectly reliable items such a test would result 
in a dichotomous distribution of scores. It would seem more appropriate, if the fourfold coefficient 
(Pearsonian) must be used, to evaluate the relationship between items in terms of the maximum value 
of this coefficient for given values of Pi and Pe. Still more appropriate would be the use of the tetrachoric 


coefficient, which use in theoretical discussion would probably go far toward aligning theory and established 
practice. 
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As a final problem in the area of prediction consider the following: 

Given a single continuous measure, x, (e.g., a test score) for each of 
N individuals selected at random from a specified population, and also 
a measure, y, either continuous or dichotomous, of the amount of some 
trait which past experience has shown each of these individuals to 
possess; let it be required to establish a critical or borderline, x,, which, 
when applied to other individuals drawn from the specified population, 
and for whom the y measure is not immediately available, will minimize 
the discrepancies between the available x score and the y score which will 
be ultimately attained. Solutions to both cases of this problem (y continu- 
ous and y dichotomous) have been provided by Burt (7), who pointed out 
that while the formulas obtained have demonstrated their value in certain 
theoretical discussions, they are not necessarily the most suitable, due, for 
one thing, to the fact that administrative conditions rarely permit the 
complete freedom that would be necessary to establish a borderline on the 
basis of the minimal discrepancy criterion alone. 


Problems of Estimation and Description 


Problems of estimation involve the determination of procedures for 
calculating sample statistics which are suitable estimates of population 
parameters. Problems of description involve the determination of pro- 
cedures for calculating indices (descriptive statistics) which suitably de- 
scribe certain characteristics of a given mass of data. Certain recent 
contributions in the areas of estimation and description are considered 
in this section. 

The method of estimating the variance (¢°) of a population, given 
the variance (s°) of a random sample taken from the population, is well 
known. Suppose, however, that in addition to s° there is available the 
variance (s.) of a random sample taken from another population having 
a variance of g’, say, where o'<g° . Is it possible to use s, in conjunction 
with s° to obtain an improved estimate of o* ? This question is discussed 
by Bancroft (3), who suggests the following procedure: First, compute the 
F-ratio, s* to gs: . If this ratio is not significant use s° and s, in the 
familiar formula for estimating the variance of a population given the 
variances of two independent random samples from this population. If 
the F-ratio is significant, base the estimate of o* on gs; only. 

Bancroft found practically no bias to result from use of this pro- 
cedure when the 20 percent level was used as the criterion for the signifi- 
cance of the F and wheng?<0.6¢.- A positive bias is introduced when 
o.=¢,. None of the various significance levels of the F which were 
studied controlled the bias thruout the zero to one range of the ratio 
@. to o', which this investigation covered. The variance of the variance 
estimate based on both s; and gs, will usually be less than the variance of 
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the estimate based on gs alone. This is of little advantage, however, unless 
the bias is adequately controlled. Bancroft also discussed an analogous 
procedure for choosing between the regression equations y=b,x,+-bex, 
and y’=b’,x,. 

Frequently in validating tests criterion measures are available, not for 
the whole population, but only for a group already selected for their 
supposed ability in the very tasks measured by the test. Suppose that 
the scores on a test (x) are available for individuals representative of 
the complete population, and that for a selected group of these individuals 
(i.e., selected for their supposed ability in the tasks measured by the 
test) scores on a criterion (y) are also available. Given these data, Brog- 
den (5) has derived formulas for estimating for the complete population 
the mean and standard deviation of the criterion scores (y), and also the 
correlation between x and y. The basic assumptions are linearity of regres- 
sion and homoscedacity. Brogden also provided an estimate of the corre- 
lation for the complete population between y and z, where z is a measure 
of some trait other than x and y, and like y, is available for only the 
selected group. The additional assumption required is an equal correla- 
tion between y and z for fixed values of x. 

A more general solution to this same problem of estimation is given by 
Burt (7), who considered the case in which there is more than one set of 
test scores and more than one set of criterion measures. Burt’s results 
are given in matrix form. In this connection it should be noted that Davis 
(14) has provided a formula for estimating the reliability of a test for 
a complete population when given (a) the reliability for a curtailed popu- 
lation, (b) the standard deviation of the curtailing variable in the 
complete population, (c) the standard deviation of the curtailing variable 
in the curtailed population, and (d) the correlation between the test and 
the curtailing variable for the curtailed population. This, of course, 
represents a specific application of the problem considered by Brogden 
and Burt. A strong word of caution against the use of these formulas 
in situations in which the assumptions they imply are not fulfilled is 
given by Burt. 

Attention is next turned to moment statistics. Scates (44) presented 
findings of such a nature as to cast doubt upon the usefulness of 8, (i.e., 
the ratio of the fourth moment about the mean to the square of the second 
moment about the mean—sometimes represented by a,) as a measure of 
kurtosis. He showed that for weights of any given magnitude there are a 
pair of points (one on each side of the mean) in a normal curve at which 
the weights can be added without effecting 8.. He also demonstrated 
that the addition of small weights in the tails of a normal distribution 
greatly increases 8., whereas large weights can be added near the mean 
without effecting its value. By way of comment it may be noted that 8.3 
is not a sufficient condition for normality, and that Scates’ results are, 
therefore, precisely what would be expected. 
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Under certain conditions it may be desirable to publish the basic data 
of a statistical study. Pierce (39) has devised a scheme for presenting a 
grouped frequency distribution in such a way that the moments for the 
ungrouped data may be obtained from it. Pierce’s scheme and the accom- 
panying correction formulas give the exact values of the moments for the 
ungrouped data, and not average corrections of the type given by the 
usual correction formulas. Pierce aptly pointed out that, whereas the 
usual correction formulas provide unbiased estimates in the sense that 
they eliminate systematic errors due to grouping, the use of these 
formulas may in a given case actually make the estimate worse. A 
correction, based on parabolic interpolation, for grouping errors present 
in the second moment about zero, has been developed by Davies and 
Bruner (13). The formula is identical with Sheppard’s at the limiting 
case of a continuous scale and high contact, and yet is adaptible either 
to an integral number of equally wide subclasses, or to a continuous 
scale. 

By an extension of the moment concept, Rodrigues (42) has developed 
new measures of variability, general similarity, and overlapping. The 
extension is effected by writing the usual definition of the r-th moment 
of a variable, x;, about the origin x;, and then summing over j as well as i. 
This idea is then applied to two distributions in such a way as to yield the 
aggregate total moment of one distribution about the other. It is this last 
mentioned development which leads to the indices of general similarity and 
overlapping. Whether or not the indices become of practical value depends 
largely upon the development of tests of statistical hypotheses concerning 
their magnitude. Inasmuch as there is a need in educational research for 
statistics facilitating the study of overlap, it is to be hoped that these 
tests will soon be forthcoming. 

Hirschman (25) has suggested that it is often required, after a dichot- 
omous (good-bad) classification has been established, to inquire into the 
average “good” student, or into the variability of the “good” students. 
To this end, he has discussed some simple algebraic relationships between 
certain descriptive statistics for the subseries and those for the entire 
distribution. Hirschman’s discussion will be of particular interest to 
teachers of statistics, who may obtain from it some suggestions for clari- 
fying for their students the interpretation of the common measures of 
dispersion and skewness. 

Statistics descriptive of the relationship between variables have been 
given attention by a number of writers. In considering statistical problems 
in test evaluation, Burt (7) discussed the problem of estimating the valid- 
ity of a test (i.e., of estimating the relationship between the test scores 
and the criterion scores) when the criterion takes the form of a twofold or 
threefold classification. When interest centers primarily in the validity 
of the test near a borderline, Burt expressed a preference for the use of 
the tetrachoric coefficient over the biserial coefficient. He also provided 
formulas for triserial correlation and for the point-biserial coefficient. 
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Tables facilitating the estimation of the standard error of the tetrachoric 
coefficient have been worked out by Hayes (24). The accuracy of this 
estimated standard error is reduced as a result of the fact that the esti- 
mate is “unstudentized”; the parametric value of the tetrachoric coeff- 
cient appears in the formula for estimating its standard error. The useful- 
ness of this standard error is limited since the exact sampling distribution 
of the tetrachoric coefficient is not known. 

The concept of correlation as the ratio of the variation in the dependent 
variable which is explained by variation in the independent variable(s) 
to the total variation in the dependent variable is gaining considerable 
popularity. Perhaps the greatest advantage of this concept lies in its 
generality, for it may be used to define simple correlation, multiple cor- 
relation, partial correlation, curvilinear correlation, and, of course, the 
correlation ratio. A discussion of this concept which will be of interest 
to teachers of statistics has been provided by Cowden (11), who gives 
an interesting diagrammatic representation of the concept. The main 
purpose of Cowden’s article is to show how this definition of correlation 
is basic to the Doolittle solution. Teachers of statistics may also be inter- 
ested in Platt’s scheme (40) for the mechanical determination of corre- 
lation coefficients and standard deviations. The usefulness of Platt’s 
scheme for bringing out the principle that correlation is a measure of 
scatter from a perfect prediction line is limited by the fact that it requires 
of the students some knowledge of the principles of mechanics. 

Two new descriptive statistics were encountered. A_ coefficient has 
been devised by Janis and Fadner (27) for the purpose of presenting an 
over-all estimate of the degree of imbalance, that is, the extent to which 
favorable, neutral, or unfavorable treatment is accorded a given topic 
in a given piece of writing. The coefficient is defined by joint con- 
sideration of two functions, viz., one in which favorable content dominates 
and one in which unfavorable content dominates, and is shown to conform 
to the ten criteria which Janis and Fadner regard as defining the concept 
of imbalance. In the final analysis, however, the validity of the coefl- 
cient depends on how well the user can form several subjective judg- 
ments, such as, defining unit of content, and classifying units of content 
as relevant or irrelevant, or as favorable, neutral, or unfavorable. 

\ new statistic for the interpretation of the validity of a test has heen 
d. vised by Richirdson (41). The statistic is based on a fourfold table 
and is in terms of a measure of the increased efficiency of the group 
selected by the test. The ratio (k) of the average effectiveness of the group 
rated successful on the criterion to the average effectiveness of the group 
rated unsuccessful on the criterion is required. The statistic is readily inter- 
pretable, but, as Richardson himself points out, it is limited by difficulties 
which may arise in estimating k. 

In concluding this section attention should be called to a report by 
Krathwohl (32) on a method for comparing the achievement of classes 
with their ability. The method involved is reminiscent of certain quality 
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control technics. As yet little application has been found for these technics 
in educational research. Educational research workers ought, nevertheless, 


to keep informed on developments in this rapidly expanding branch of 
applied statistics.* 


Problems of Statistical Inference in the Nonparametric Case 


The most common problems of statistical inference are solved by assum- 
ing the form of the sampling distribution to be determined in a known 
way by certain parameters, the values of which are unknown. It is about 
the values of these unknown parameters that the inferences are to be 
made. Such problems are classified under the heading of the parametric 
case. This case includes all the theory based on normality assumptions (46). 

However, many problems in sampling theory may be reduced to an 
enumeration of possible combinations and to the determination of the 
probability of the occurrence of certain specified combinations. The 
solutions of such problems are independent of the form of the population 
distribution function and assume only its continuity. Unfortunately when 
the number of observations is large the calculations involved in applying 
a combinatorial analysis become extremely tedious. It is sometimes possible 
to circumvent this difficulty by determining the asymptotic distributions 
of the combinations. Such problems are illustrative of the type which are 
classified under the heading of the nonparametric case. 

Much attention has recently been given to the nonparametric case in 
statistical inference. Scheffé (46) provided a fifty-eight item bibliography 
of contributions in this area. It is not possible to describe in detail here 
the many nonparametric case problems which have been solved. A few 
solutions will be discussed briefly and others will be cited. 

Consider first two solutions of the problem of two samples which may be 
stated as follows: Let xm, ..., Xm and y;,,..., yn be two random 
samples from continuous univariate populations. It is required to test 
the hypothesis that the distribution functions of these populations are 
the same. 

The first solution to be presented was originally proposed by Wald and 
Wolfowitz * and is illustrated in a paper by Swed and Eisenhart (49). 
The procedure is as follows: Arrange the observations from both samples 
in a single series in increasing order of magnitude. In doing this attach 
some distinguishing character (such as an accent or prime) to the observa- 
tions of the second sample. When two different kinds of objects (i.e., the 
observations of the first sample as distinguished from the observations of 
the second sample) are thus arranged in series, they will form two or more 





* A simple introduction to certain quality control technics may be had from two publications (May 1941 
and April 1942) of the American Standards Association. 70 E. 45th St., New York. The titles in order of 
publication date are: Guide for Quality Control and Control Chart Method of Analyzing Data and 
Control Chart Method of Controlling Quality During Production. 

*A. Wald and J. Wolfowitz, “On a Test Whether Two Samples Are from the Same Population.” 
Annals of Mathematical Statistics. 11: 174-62, June 1940. 
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distinct groups or runs of like objects. For example, in the arrangement 
of xxyyyxy, there are four distinct groups or runs. In this test small 
values of the statistic, which is simply the number of runs, are significant. 
Swed and Eisenhart provide tables giving the probability of obtaining a 
number of runs equal to or less than the number observed, under the 
hypothesis that the two populations involved are the same. The continuity 
of the population distributions is assumed. The tables are entered with 
the values of m, n, and u’ (u’ is the observed number of runs). When 
m-+-n is large and the ratio m/n fixed u’ is normally distributed about a 
readily calculable mean and variance. 

The second solution to be presented was devised by Mathisen (35). 
The procedure may be outlined as follows: Draw a sample of size 2n-+-] 
and determine the median. Draw a second sample of size 2m and let m, 
be the number of observations in the second sample which are below the 
median of the first. Mathisen has obtained the probability function for m,. 
This function is independent of the population distribution function, f(x), 
and assumes only the continuity of f(x) and the independent random 
selection of the two samples from f(x). For large samples 
[m,—E(m,)]6m, is normally distributed about zero with unit variance. 
Formulas are given for E(m,) and for 6,,. Mathisen also presented an 
analogous solution based on the quartile points. It should be noted that 
Bowker (4) has shown that Mathisen’s tests are inconsistent * when the 
samples are from different populations which have identical cumulative 
frequency distributions in the neighborhood of their medians or quartile 
points. If these possibilities are not admissable the tests are consistent. 

The merit of these solutions to the two sample problems is that they 
assume only that the population distribution function is continuous and 
that the samples are drawn at random independently. Their weakness, of 
course, lies in their inefficiency in the sense that they do not make full use 
of the information given by the data and in their consequent lack of 
power. That is, a rather gross disparity between samples is necessary to 
yield significance. Hence when tests involving additional assumptions are 
plausible such tests should be used (49). Wolfowitz (55) also pointed 
out that the extreme generality of the hypothesis tested is a limiting factor 
in the general usefulness of these tests. 

Wald and Wolfowitz (52) have devised an exact test for randomness in 
the nonparametric case which is based on the concept of serial correla- 
tion.* The statistic used is not actually the serial correlation coefficient but 
one which results in an equivalent test. Let x, . .. , X», be the observa- 
tions of a sample in the order of drawing. Then the hypothesis to be tested 
is that x), . . . X, are independent observations from the same population. 





5 A statistical test is called consistent if the probability of rejecting the null hypothesis when it is false 
approaches one as the sample number approaches infinity. 


® A bibliography and brief review of the theory of serial correlation have also been given by Dixon (15). 
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n 


The statistic used is R= xX where x is replaced by x 
ma yere s be —. 
for all values of i+-h greater than n. It is necessary to choose the lag, h, 
on the basis of the alternatives to randomness in the situation under con- 
sideration. For example, if some sort of periodical or cyclical characteris- 
tics are suspected, h should be chosen to conform to these periods. If h 
n 


is prime to n, the distribution of R’= xx is the same as the dis- 
i==] i+ 

tribution of R, and consequently R’ may be used as the statistic in such 

situations. Since h can always be made prime to n by the omission of a 

few observations, the statistic R’ is quite generally useful. 

An exact test of the significance of R or R’ can be effected by forming 
the n factional (n!) permutations of the n observations and computing the 
R or R’ for each. Since the probability of occurrence of each permutation 
under the hypothesis of randomness is 1 /n!, it is possible to determine 
the probability of obtaining a member of a specified set of values of 
R or R’. Wald and Wolfowitz showed that under some mild restrictions the 
limiting distribution of R (R’) is normal with a calculable mean and 
variance. Provided the population distribution function is continuous 
the test under discussion does not depend on this function. Even this 
restriction of continuity may under certain conditions be unimportant in 
the limiting case. 

Limitations on space require that other contributions to the nonpara- 
metric case be given only brief mention. Developments in the theory of 
runs," and its use in testing the randomness of a sample, have been reviewed 
by Wolfowitz (56). Illustrative applications taken primarily from the field 
of quality control have been provided. Articles on runs up and down have 
been contributed by Wolfowitz (55) and by Levine and Wolfowitz (33). 
Casanova (8,9) has discussed the use of the method of runs, and also the 
use of a variety of other nonparametric tests, for testing the random order 
of the keyed reponses to test items. Casanova’s suggested application of 
these statistical tests is remindful of cutting butter with a razor. Neverthe- 
less, workers in educational research who are interested in studying the 
nonparametric case may find Casanova’s articles to be fairly good starting 
points. The solutions to several nonparametric problems are also described 
in an article by Wald and Wolfowitz (53), who show how a certain limiting 
distribution theorem may be applied to them. 

Three tests based on the signs of the differences of successive observa- 
tions are described by Moore and Wallis (36). These tests were designed 





7 Wolfowitz (55) stated that runs are a matter of technic and that new advances and applications would 
soon render most definitions obsolete. The following definition is given by S. S. Wilks, Mathematical 
Statistics, Princeton, 1943, p. 200. “Suppose we have an arbitrary sequence of n elements, each element 
being one of several mutually exclusive kinds. Each sequence of elements of one kind is called a run.” 
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for use in time series analysis but may prove to have a wider field of 
application. 

A problem of some interest in educational research is that of deter- 
mining whether an individual has matched two series of items (e.g., hand- 
writing specimens) better than could have been done by chance. This rep- 
resents an application of what has come to be called the problem of card 
matching. Anderson (1) has provided one of the more recent discussions 
of this problem. An article by Greenwood (20) on the problem of prefer- 
ential matching * may also be of some interest. 

Thorton (50) discussed the use of Olds’ tables ® giving the probabilities 
for various values of the factor 2d? as it appears in the rank order correla- 
tion formula. The probabilities in Olds’ tables for n equals 2 thru 7 
are based on possible combinations and are exact. The probabilities for 
larger values of n are based on asymptotic curves. Thornton pointed out 
a few minor errors in Olds’ tables and compared levels of significance of 
the rank order coefficient based on Olds’ tables with the levels as deter- 
mined by other methods. 


Other Miscellaneous Problems of Statistical Inference 


Users of statistical tests have become increasingly cognizant of the re- 
strictions implicit in the tests. The assumption of a normal population 
distribution has in particular drawn much attention. The use of certain 
transformations is a common method of circumventing this particular 
restriction. Curtiss (12) provided a mathematical basis for effecting square 
root, inverse sine, and logarithmic transformations leading to a normal 
distribution and a stable variance. 

Another approaclr in overcoming this restriction would be to determine 
the exact sampling distributions of useful statics for samples drawn from a 
population having a specified nonnormal distribution function which is 
plausible in the given situation. Festinger (16) provided an exact test of 
significance for means of samples drawn from a population having an 
exponential (J-shaped) distribution function. Festinger showed that the 
ratio of 2n (n is the sample number) times the sample mean to the 
population mean is distributed as Chi-square for 2n degrees of freedom. 
This fact makes it possible to test any exact hypothesis about the magnitude 
of the population mean, and to establish a critical region. Festinger has 
also shown that the ratio of the larger to the smaller of two sample means, 
based on independent random samples from an exponential population, is 
distributed as F for 2n, and 2n, degree of freedom, the n’s being the 
respective sample numbers. Festinger (17) has developed analogous tests 





8 Preferential matching may be described as follows: Let the two sets of items to be matched be Ai and 
Bi i= 1, 2... ., m. Ai is compared with each Bi, the one it most nearly matches being given a score 
of n, the next a score of n — 1, etc., to 1. This procedure is repeated with each Aj. 

* E. G. Olds, “Distributions of Sums of Squares of Rank Differences for Small Numbers of Individuals,” 
Annals of Mathematical Statistics, 9: 133-48, 1938. 
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for means of samples drawn from populations having Type III (skewed) 
distribution functions. These tests, however, are not exact, since the 
population variance which must be estimated from the sample enters into 
both the Chi-square statistic and its degrees of freedom. Moreover, it ap- 
pears that the inaccuracy of the tests would be greatest in those situations 
in which they would otherwise be most useful, that is, when the degree 
of skewness present in the population distribution is marked. 

Another restriction implicit in the t-test of the significance of the dif- 
ference between sample means is that the ratio between the population 
variances (usually taken to be one) must be known. Scheffé (45) dis- 
cussed a solution to the problem of two samples which is based on the 
t-distribution and which is applicable when the ratio of the population 
variances is unknown. The restriction of normal population distribution 
functions obtains, and the number of degrees of freedom involved is n—l, 
where n is the number of observations in the smaller sample. 

The approach to analysis of variance described by Jackson '° has had 
increasing application. Consequently teachers and students of educational 
statistics will welcome Rulon’s straightforward exposition (43) of the 
mathematics underlying this approach. Rulon dealt with the simplest 
case, the problem of two samples, and showed the relationship between 
the z-test (or F-test) and the t-test. Other articles on analysis of variance 
have been contributed by Grant (19) and Peters (38). Both of these 
articles review the earlier Garrett and Zubin article.’ Grant’s brief ex- 
position of the analysis of variance technic is quite clear and may be 
of interest to teachers of statistical methods in education. Peters continues 
his practice of pointing out relationships between analysis of variance and 
classical methods, and of deprecating the former. 

Problems arising in the application of statistical tests when the observa- 
tions are in the form of percents or fractions have received the attention 
of Cochran (10), Baker (2), and Burr and Hobson (6). Cochran was 
primarily concerned with analysis of variance technics for percentages 
based on unequal numbers. He discussed three schemes of weighting 
such observations which are suitable under differing conditions and 
described methods of checking their efficiency in a given situation. Baker 
described a test of the significance of the difference between two treat- 
ments (x and y) applied to different pairs of groups in different localities, 
and where the effectiveness is expressed as the percent of each group that 
responds to the treatment. The theory of the test rests on the fact, that if 
the treatments are equally effective, then the percents will distribute them- 
selves symmetrically about the line y=x. A transformation is effected 
which makes this line coincide with the x-axis, and the effectiveness of 
the treatments is evaluated by testing the significance of the regression 





7 R. W. B. Jackson, “‘Applications of the Analysis of Variance and Covariance Method to Educational 
Problems.”’ Bulletin No. 11 of the Department of Educational Research, University of Toronto, Toronto, 1940. 


"H. E. Garrett and J. Zubin, “The Analysis of Variance in Psychological Research,"’ Psychological 
Bulletin, 40: 233-67, 1943. 
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of y’ on x’ (y’ and x’ represent the transformed values of y and x). 
Burr and Hobson described a method for making a mass test of the 
significance of the difference between two proportions, when each pair of 
proportions involved is based on the same sample numbers (i.e, n, and n, 
are the same for all pairs but n, need not equal nz). In addition to the 
usual assumptions involved in testing the significance of the difference 
between two proportions, it is necessary to take cognizance of the fact 
that if one hundred pairs of differences are tested and if the 5 percent 
level of significance is adopted, then the expected number of significant 
differences under the null hypothesis is five. 

Simon (47, 48) discussed the situation in which the risk of making an 
error of the first type (rejecting a true hypothesis) equals the risk of 
making an error of the second type (retaining a false hypothesis) .' 
Simon referred to tests suitable in such a situation as symmetric tests 
and showed in the case of the two sample problems that the uniformly most 
powerful symmetric test of the hypothesis that the mean of population 
y is greater than the mean of population x is simply that y > x. The test 
assumes that the population distribution functions are normal and have 
equal variances, and requires the sample numbers to be equal. 

A simple but crude test of the hypothetical magnitude (H) of a popula- 
tion mean is described by Knudsen (31). The statistic is simply (H—x) / 
Range. The 5 percent and 1 percent critical points for this statistic are 
tabled for sample numbers from 3 thru 30 and for 40, 60, 120, and 500. 

Gumbel (22) discussed the lack of reliability of the Chi-square test of 
goodness of fit of an observed distribution of a continuous variate. He 
pointed out that two equally competent statisticians working with the 
same data might be led to adopt different conclusions on the basis of this 
test as a result of (a) making different choices of intervals, (b) adopting 
different starting points for the first interval, and (c) following different 
procedures of combining end intervals to increase the expected frequen- 
cies for these intervals. Gumbel also pointed out that the effects of combin- 
ing end intervals (i.e., a reduction in the magnitude of Chi-square and 
reduction in the degrees of freedom), while counteracting, are not neces- 
sarily equally potent, and that this practice, moreover, violates the postu- 
late that all intervals be of the same magnitude. 

Grubbs (21) has shown that the sampling distribution of the radial 
standard deviation is the Chi-square distribution for 2n—2 degrees of 
freedom. The radial standard deviation. which has not as yet been applied 
in educational research, is used in ballistics to measure the accuracy of 


‘ 1 mi i 
rifle fire. It is defined as Z—= [:> (x, — x)?+ > (yi — Dal 


where x, and y, are the abscissa and ordinate the i-th point measured from 
an arbitrary origin, and where n is the number of points. The derivation 








12 See also the article by Wald (51) reviewed in the section on probl of predi 
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of the sampling distribution cited assumes that the variance of the x-popu- 
lation equals the variance of the y-population. 

It is frequently desired to determine values of Chi-square or of t which 
have not been tabled. Peiser (37) has developed simple formulas for 
approximating values of Chi-square and t for any given number of degrees 
of freedom and for any given percent point. These formulas may also be 
used to approximate the percent points corresponding to obtained values 
of Chi-square or t given the number of degrees of freedom. 

Only two studies dealing with problems of sampling will be cited. 
Hansen and Hurwitz (23) have written an article covering, in a rather 
comprehensive fashion, developments in the theory of sampling from finite 
populations. Their discussion covered problems of subsampling and of 
estimation in various subsampling systems. Madow and Madow (34) 
discussed the problem of systematic sampling (i.e., a sample picked by 
choosing a starting point and then selecting every k-th element until the 


desired number of elements is obtained) and of estimation based upon 
such samples. 
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CHAPTER VIII 


Computational Technics 
IRVING LORGE 


Miany of the more significant developments in computational technic 
overlap basic contributions to statistical theory. It is difficult, indeed, to 
make clear cut distinctions between computational processes and statistical 
innovations. Some of the studies reviewed, therefore, must illustrate both 
theoretical and computational processes. During the past three years, the 
articles showing computational development deal primarily with the 
utilization of machines (particularly those of the International Business 
Machines Corporation) for mass processing of data, with methods for 
solving simultaneous equations, with methods of using analysis of variance, 
and with simplifications and extensions of factor methods. It must be 
recognized, however, that many contributions to mass methods of handling 
data developed or extended by the armed services may not be reviewed 
at present since these methods are classified as “restricted” or “confiden- 
tial.” There is good reason, however, to believe that in the next three years 
the computational methods developed in the armed services will become 
available to the research worker. 


General 


Most significant is the appearance of a quarterly journal, Mathematical 
Tables and Other Aids to Computation (2). The new journal is a clearing- 
house for information about mathematical tables as they are developed, 
errors in published tables, and machine aids to computation. The educa- 
tional researcher should be particularly interested in tables reviewed by 
Committee K and in the explanation of recent developments in calculating 
machines and in mechanical computation by Committee Z. In recent issues 
the following references to Recent Mathematical Tables should be helpful 
(2:91, April 1943; 2:101, July 1943; 2:108, 109, 110, 111, 112, October 
1943; 2:129, 130, January 1944; and 2:164, October 1944). 

For those who use the various machines of the International Business 
Machines Corporation, the Pointers frequently give illustrations for adapt- 
ing the machines to various statistical computational processes (25). The 
Pointers are particularly rich in applications adapted to the tabulator and 
the multiplier. Frame (16) described devices for solving algebraic equa- 
tions covering graphics, kinematic linkage, dynamic balances, hydro- 
static balances, electric and electromagnetic adaptations, harmonic ana- 
lyzers, and calculating machines. 


Tables, Graphs, and Nomograms 


In addition to the tables cited in Mathematical Tables and Other Aids 
to Computation (2), Hayes (19) has prepared tables of the standard 
error of the tetrachoric correlation coefficient for argument from .00 to 
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.90 in steps of .10 and for .95 with cuts at 50, 30.9, 15.9, 5.48, 2.28 and 
0.466 percent. Anderson and Houseman (1) have developed tables which 
will greatly facilitate polynomial curve fitting. Casanova (8) has developed 
tables and charts for weighting subtests into a general sum where the 
weights are functions of test length and reliability. 

Recent nomographs and graphs are those developed by Bliss (5) on the 
chi-square distribution, by Lord (33) for computing the fourfold point 
biserial correlation, by Paschal (35) for solving partial correlation 
problems, and by Jurgensen (29) for obtaining centiles. 


Tabulating Machines 


Benjamin (3), utilizing the method of “counter rolling” on the I.B.M. 
tabulator (in the absence of card cycle total transfer), has developed a 
procedure for computing the sums of squares and cross products without 
the use of summary punching or manual addition. Unfortunately, the 
method is not practical beyond two variable problems. Bloom and Lubin 
(6) showed how the graphic item counter of the I.B.M. test scoring machine 
may be used to obtain Pearson Product-Moment correlation coefficients. 
Grossman (17) illustrated how the test scoring machine may be used to 
obtain weighted scores. He adapted the line length of the test scoring 
blank to weight the scores. 

A significant study of errors in card punching was made in the Bureau 
of the Census. Deming, Tepping, and Geoffrey (10) analyzed 25,000 
wrongly punched cards. Of the erroneous cards 86 percent had only one 
mistake, 9 percent had two mistakes, and 5 percent had three or more 
mistakes. The mistakes were classified as machine errors (failure to skip) 
and operator errors (failing to include a field, repeating a field, inter- 
change of numbers, etc.). Operator errors predominate in perseveration of 
usual or majority punching i.e., the operator tends to use the more frequent 
punches in cards where a typical or unusual punching is required. 
Deming and his co-workers, however, indicated that errors tend to com- 
pensate. . 

Much of the material deals with the use of tabulating machines for 
the preparation of tables. Herget and Clemence (20) suggested the use of 
modified second order or higher differences to reduce the labor of pre- 
paring linearly interpolable tables. Extending this idea, Miller (34) pointed 
to a further generalization which reduced the amount of work in table 
preparation. King (30) described his method of tabling exponential func- 
tions; Thomas and King (42), a method of tabling logarithms; Knudsen 
(31), a method of obtaining the coefficients for orthogonal polynomials 
which requires 20 percent of the time needed in Warren’s technic: 
Kormes (32) discussed a method applicable to the I.B.M. and Remington- 
Rand machines for obtaining numerical solutions of finite difference 
equations. 

Watkins (46) illustrated coding technics to increase the speed and 
efficiency of class and school record keeping. 
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Simultaneous Equations 


Much work has been published giving the logic and process of matrix 
calculation. An extraordinarily useful exposition was that of Hotelling 
(23, 24) who gave modern methods of solving linear equations, deter- 
minants and inverses of matrices. Special attention was given to iterative 
methods and to means for accelerating convergences. Hotelling showed the 
importance of considering the rounding out of errors in computing. 
Tuckerman (45) suggested that each unknown be found in the form 
x(1+E) to estimate the computational uncertainty. Dwyer has continued 
his significant publications on the value of the “abbreviated Doolittle.” 
He reviewed the methods of solution of problems of multiple and partial 
correlation and regression with indications of solutions or related equa- 
tions and identification of related statistics (13) together with a bibliog- 
raphy of thirty-seven titles on related work. The validity and value of the 
“abbreviated Doolittle” (14, 15) and the ease of the compact solution 
(11, 15) were treated adequately. 

Hoel (21) showed the essential indentity of standard routines for 
computing the inverse of the matrix; Samuelson (36, 37) developed a 
more efficient method for determining the coefficients of the character- 
istic equation. In addition Spoerl (41) and Bingham (4) also gave 
procedures for solving the matrix. Jackson (26) discussed several methods 
for obtaining approximate multiple regression weights, and Sandomire 
(38) gave a table of factors to obtain successive cumulative sums without 
intermediate recording. 

Dwyer (12) discussed grouping errors and suggested methods adaptable 
to the I.B.M. tabulator to reduce these errors. Day and Sandomire (9) 


illustrated the use of Fisher’s discriminant function to distinguish among 
more than two groups. 


Analysis of Variance 


The fundamental principles underlying analysis of variance designs, 
their construction, and their numerical solution were given by Satterthwaite 
(39). Johnson and Tsao (28) applied analysis of variance in a problem 
of the estimation of differential limen values giving complete analysis of a 
4x7x2x2x2 pattern, and to a problem in the study of individual educational 
development (27) giving complete analysis of a 2x3x3x3 pattern Butsch (7) 
has developed a work sheet, using logarithms, for the Johnson-Neyman 


method, and Schultz (40) has adapted analysis of variance technics to 
ranked data. 


Factor Analysis 


Thurstone has developed a new factor analysis method and gives the 
computational procedure for estimating a factor matrix eliminating the 
necessity for calculating successive residual matrices (43). Essentially, the 
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method yields clusters of test vectors and in reality is a multiple group 
method of factoring the matrix. Holzinger (22), too, has developed a 
simple factor method based on the idea of substructuring the matrix. 
Tucker (44), using the method of bordering the original correlation 
matrix with a new row and column for each component, developed a 
computational procedure that eliminated the labor of obtaining residuals. 
An interesting adaptation of Tucker’s basic procedure to the I.B.M. 
tabulator and calculating machines (19) showed that factor analysis 
can be adapted to mass methods efficiently and accurately. The study gave 
a complete example together with wiring diagrams, forms, and calcula- 
tions. 
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Abacus, use in arithmetic, 281 

Absenteeism, in junior high, 114 

Acceleration, and superior students, 117; 
of pupils, 115 

Accidents, 88 

Achievement, measures, 410 

Acoustics, 55 

Activity education, 206 

Adjustment, factors affecting, 103 

Adolescence, 105 

Adult education, 364 

Agriculture, 180 

Algebra, and general science, 293; pre- 
diction, 313 

American Council on Education, 413 

Analysis, factor, 443; of case material, 
355; of covariance, 383; of texts in 
research methods, 346; of variance, 
380, 443; of social needs, 6 

Anecdotal records, 119, 400 

Appraisal, controversy, 360; in armed 
services, 143; in industry, 148; in 
Marine Corps, 147; in Navy, 146; of 
individual, 138; of materials, 368 

Area curriculum, in science, 324 

Architects, 55, 58 

Arithmetic, appraisal of practice, 281; 
college, 312; combinations, 284; com- 
mercial, 312; diagnosis, 280; division, 
284; grade placement, 279; high-school, 
310; junior college, 312; meanings, 
280; out-of-school uses, 278; problem 
solving, 282; readiness, 279; remedial, 
280, 299, 310; research, 285; teaching, 
282; see also mathematics 

Armed Forces Institute, 304; tests, 413 

Armed forces, research by, 243 

Armed services, guidance in, 108, 143; 
personnel work, 135 

Army, 144; personnel training, 189 

Art rooms, college, 36 

AST programs, chemistry, 304 

Articulation, between secondary school 
and college, 133 

Atmospheric quality, 51 

Attendance, factors affecting, 113; hand- 
books, 115; means of improving, 114; 
organization for, 114; school, 112 

Attitudes, and social learning, 235; effects 
of reading, 260; studies, 365 

Audiometer, 403 

Auditory aids, 243 
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Basic course idea, 208 

Behavior rating scales, 398 

Bibliographical aids, 336 

Bibliographies, films, 340 

Biographies, 338 

Biology, 305 

Boiler efficiency, 38 

Bonded debt, 79 

Bonds, 78, 79; legal aspects, 86 

Book lists, 338, 340 

Building cost indexes, 78 

Buildings. See School plants 

Burned buildings, replacement, 25 

Business, appraisal in, 148; guidance in, 
108 

Business education, equipment, 29 


Campaign for school support, 80 

Campus plans, 34 

Capital outlays, extent, 78; federal sup- 
port, 24; legal aspects, 85 

Card, catalog, 338; punching machines, 


Case studies, 400 

Case study, 141, 352; trends, 356; values, 
352 

Catalog, card, 338; film, 340 

Census, and occupational trends, 177 

Census information, 173; limitations, 174 

Ceramics, 179 

Certification, 328 

Check lists, 142 

Chemistry, 304 

Chemists, 180 

Childhood education, curriculum, 205 

Chi-square technic, 386 

Civil aeronautics authority, 147 

Classroom design, 15, 56 

Classrooms, college, 34 

Codes, building, 59 

College, mathematics, 310; students, per- 
sonnel services, 134; surveys, 366 

College students, surveys, 367 

Colleges, accelerated programs, 115; 
plant facilities, 34; and universities, 
curriculum, 210; objectives, 198 

Color, and lighting, 43 

Comics, 260 

Comic strips, 160 

Commercial arithmetic, 312 

Commission on teacher education, 321 

Community centers, 7; needs, 324 
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Community Facilities Act, 24 

Community study, preservice, 325 

Community use of school plant, 8 

Computational technics, 441 

Consumer research, 395 

Contracts, for school plants, 87 

Cooperative planning, 21 

Cooperative studies, 212 

Core courses, 208 

Core curriculum, 208 

Costs, school plant, 77, 78; unit, 79 

Counseling, evaluation, 121, 187; in Army, 
189; outcomes, 122; processes, 155; 
programs evaluation, 121; results, 159; 
varying points of view, 155 

Counselors, duties, 188; duties in college, 
185; in government agencies, 188; in 
industry, 186; in Navy, 188; personal 
characteristics, 186; preparation, 185, 
186 

Covariance, 384 

Credit for Armed Forces Tests, 413 

Cumulative records, 119, 142 

Curriculum, 205; broad-fields, 208; child- 
hood education, 205; higher education, 
210; secondary education, 207; teach- 
ers colleges, 211 

Custodial personnel, 61 

Custodians, dress, 63; relations with 
teachers and pupils, 66; salaries, 62; 
training, 61; work schedule, 62 


Debt limitations for school plant, 85 

Decorating, 65 

Delinquency, 235 

Democratic planning, 18 

Dentistry, 180 

Depreciation, 73 

Descriptive statistics, 430 

Design, trends, 56 

Diagnosis, in junior college mathematics, 
313 

Dictionary, of education, 341 

Directories, biographical, 341 

Division, 284 

Documentary, analysis, 344; research, 344 

Dormitories, college, 37 

Duties, of personnel workers, 185 


Eclecticism, 200 

Educational information, 173 
Educational philosophy, 196 
Efficiency rating, 399 
Eight-Year Study, 212 
Elementary-school guidance, 101 
Elementary schools, 13 


Empathy, 397 

Employe ratings, 399 

Employers, school records needed by, 120 

Encyclopedias of education, 341 

Engineering, 179 

Enrolments, trends, 112 

Environment, and guidance, 105 

Equipment, lighting, 48; mechanical, 66; 
needs, 26; vocational education, 29 

Estimation, statistical, 427 

Evaluation, of counseling programs, 121; 
of guidance in secondary schools, 132; 
of guidance programs in college, 134; 
of personnel programs, 131; of tests, 
411; of test technics, 441 

Evaluation studies, 360; problems, 361 

Evaluative, criteria for guidance pro- 
grams, 121 

Excursions, 222, 244 

Exhibits, library, 258 

Experimental studies, 362 


Factor analysis, 443 

Failure, causes, 117 

Fatigue, and learning, 228 

Federal, control of education, 25; funds 
for building, 80; support for school 
plant, 24 

Federal Works Agency, 24 

Field trips, 244 

Films, guides to, 340 

Fire insurance, 71; losses, 71; preven- 
tion, 65 

Floors, maintenance, 64 

FM radio, 252 

Follow-up studies, 132, 133, 159, 365 

Forgetting, 230 

Frequency studies, 368 

Functional planning, 13 


General, education, 208; mathematics, in 
college, 315; science, and algebra, 293 

Geometry, 313, 317 

Gifted children, educational provisions 
for, 117 

Government, guidance in, 143 

Graphs, 441 

Group, discussion, 168; methods, 221; 
therapy, 164, 168 

Growth and development, of individuals, 
102 

Guidance and counseling, in adult educa- 
tion, 107; in armed services, 108, 143; 
in Army, 143; in elementary school, 
101; in government, 143; in higher 
education, 106; in industry, 108; in 
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Marine Corps, 147; in Navy, 146; in 
preschool, 101; in reading, 259; in 
secondary school, 104; needed research, 
169, 190; preparation of workers, 185; 
thru groups, 164; use of tests, 138 

Guidance programs, college, 134; elemen- 
tary, 131; in armed services, 135; in 
government, 134; in industry, 134; sec- 
ondary, 132 

Gymnasiums, 64 


Health, and attendance, 113; and physi- 
cal characteristics, 104; and scholastic 
achievement, 104; facilities, college, 37 

Heating, 51, 55 

High-school pupils, 104 

Higher education, plant facilities, 34; 
surveys, 367 

Historical, research, 344; 
mathematics, 276 

Historiography, 344 

Home economics, equipment, 30 

Housekeeping, 63 


studies in 


Indexes, book, 337 

Individual methods, 221 

Industrial education, equipment, 30 

Industry, appraisal in, 148; duties of per- 
sonnel workers, 186; guidance in, 108 

Inference, statistical, 431 

Insurance, legal aspects, 88; liability, 90; 
programs, administration, 72; school 
plants, 71 

Interest studies, in science, 303 

International Business Machifes, 402, 442 

Interviews, 395, 399; evaluation, 157; for 
appraisal, 139 


Janitors. See Custodians 

Job, families, 179; rating, 398 
Journals, educational, 341 

Junior college, plant facilities, 39 


Labor force, 174 

Laboratories, college, 35 

Lanham Act, 24, 80 

Latin square technic, 382 

Leadership, 167, 169 

Learning, and motivation, 228; studies of, 
227 

Legal, aspects of school plants, 83; litera- 
ture, indexes, 337; research, 345; serv- 
ice, 180 

Liability, 83, 88 

Libraries, 256; administration, 257; bib- 
liography, 256; college, 35; elementary- 
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school, 257; evaluation, 258, 263; ex- 
hibits, 258; needed research, 264; re- 
sources, 336; routine, 259; surveys, 
263; use of, 261 

Lighting, 41; brightness and glare, 43; 
fluorescent, 47, 56; present status, 41; 
shop, 31; trends, 41 


Magazines, educational, 341 

Maintenance, of floors, 64; roofs and 
walls, 65 

Marine Corps, 147 

Market research, 395 

Materials, trends, 54 

Mathematics, and human relations, 168; 
achievement, 316; attitudes and inter- 
est, 316; college, 310, 315; courses of 
study, 276; curriculum, 277, 316; diag- 
nosis, 313, 314; elementary school, 276; 
general, 315; guidance, 314; high- 
school, 310; historical studies, 276; jun- 
ior college, 314; junior high school, 
298; measurement, 278; methods, 299; 
nature of learning, 277; predictions, 
314; teacher education, 321; vocabulary 
studies, 277; see also arithmetic 

Maturation, 102 

Mechanical devices, 401 

Medical service, 180 

Memory, 230 

Mental ability, and achievement, 104 

Methods of research, analysis, 377 

Methods of teaching, 218; aural, 222 

Metronoscope, 402 

Microfilms, 342 

Motion pictures, 243; out-of-school, 250, 
290; guides, 340 

Motivation, 228 

Motor skills, acquisition, 229 

Multiple factor analysis, 388 

Multiplication combinations, 284 

Museum materials, 249 

Music rooms, college, 36 


National testing programs, 415 

Navy, 146; personnel training, 188 

Need, determination of, 10; equipment, 
26; school plant, 80 

Needed research, curriculum, 213; in 
group guidance, 169; in plant costs, 81; 
in training of guidance workers, 190; 
libraries, 264; on school plant, 8 

Needs, equipment, 26; school plant, 80 

Negroes, 142; attendance, 113; opportu- 
nities, 210; teachers colleges’ curricu- 
lums, 211 
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Newer technics of research, 379 
New York Times test, 415 
Nomographs, 441 
Nonpromotion of pupils, 116 
Notebooks, in science, 293 
Nurses, 123, 139 

Nursing, 179 


Objectives, in teacher preparation, 328 

Observation, for appraisal, 140; studies, 
400 

Observational technics, 394 

Occupational groups, 176 

Occupations, analysis, 179; conditions 
and requirements, 179; distribution by, 
175; distribution within states, 177 

Office equipment, 29 

Operation of school plant, 61 

Ophthalmograph, 402 

Opinion research, 395 


Painting, 65 

Percentage, teaching, 299 

Periodicals, history of, 345; in teaching 
science, 302; lists and indexes, 339; 
new, 341 

Persistence in school, 112 

Personal documents, analysis, 141 

Personality, and adjustment, 103; and 
adjustment in college, 107; and adjust- 
ment in secondary schools, 105; tests, 
142 

Personnel, custodial, 61; training, trends, 
189; work, conditions affecting, 112; in 
armed services, 135; in industry and 
government, 134; programs, 131 

Philosophy, 196 

Phonographic recording, of interviews, 
157, 160 

Phonographs, 246 

Photographic reproduction, 342 

Physical development, of college stu- 
dents, 106; of high-school students, 104 

Physical education, college plant facili- 
ties, 36 

Plastics, 55 

Playgrounds. See Sites 

Plywood, 55 

Population data, 175 

Postwar, school plants, 57; suggested oc- 
cupations for veterans, 180 

Pragmatism, 198 

Prediction, of academic success, 139; of 
college success, 106; statistical, 423 

Prefabrication, 58 

Preinduction training, in mathematics, 
318 








Preparation, of guidance workers, 185 

Preschool children, 101 

Preservice teacher education, 325 

Problem solving, 234; in arithmetic, 291 

Prognosis, in junior college mathematics, 
313 

Progressive Education Association, 199 

Projective technics, 140 

Promotion of pupils, 115 

Property, legal use, 84 

Psychotherapy, 158, 168; research prob- 
lems, 161 

Public relations, 20 

Publicity and school support, 80 

Pupil records and reports, 118 


Questionnaires, 142; school practices, 
362 


Radio, appraisal, 368; availability, 244; 
effectiveness, 246; equipment, 244; re- 
cordings, 246 

Rating technics, 396 

Readiness, in arithmetic, 279 


_ Reading, as related to science, 289; in- 


terests, 260 
Records, personnel, 142 
Reference works, 337 
Regional testing programs, 415 
Remedial work, in arithmetic, 280 
Report cards, 142 
Research, and philosophy, 196; documen- 
tary, 344; legal, 345; methodology, 340 
Residence halls, college, 37 
Restrictions, wartime, 26 
Roofs, maintenance, 65 
Rorschach test, 140 


Safety, 65 

Sampling technics, used by census, 175 

Scales, behavior, 398; efficiency, 399; 
socioeconomics, 397; teacher rating, 


Scholarship, and student activity, 165 

School plants, availability, 7; codes, 59; 
construction, 58; costs, 78; financial 
aspects, 77; financial restrictions, 85; 
flexibility, 59; insurance, 71; junior 
college, 39; legal nature, 83; legal 
ownership, 86; needs, 10, 11, 24; plan- 
ning, 13; postwar, 57; social implica- 
tions, 6; temporary, 58; value, 77; war- 
time needs, 27 

Science, certification, 328; college, 304; 
course enrichment, 294; course mate- 
rials, 274; difficulties in teaching, 274; 
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elementary school, 272; experimental 
background, 272; general, 293; high- 
school, 301; interest studies, 303; junior 
college, 301; junior high-school, 289, 
295; methods in high-school, 302; note- 
books, 293; objectives, 301; organiza- 
tion, 301; outcomes, 304, 305; problem 
solving, 291; study methods, 274; teach- 
ing aids, 302; teacher education, 306, 
321; textbooks, 291; trends in teaching, 
306; visual aids, 290; vocabulary 
studies, 306 

Scientific, attitudes, 292; methods, 292 

Score cards, for college buildings, 34 

Scoring of tests, 410 

Secondary schools, 14 

Self-appraisals, 362 

Shock therapy, 159 

Simultaneous equations, 443 

Sites, area, 6 

Slow learners, 116 

Social, living, 208; maturity, 142; signifi- 
cance of school plant, 6 

Socioeconomic scales, 397 

Sociology, research methods in, 353 

Sociometry, 166 

Sound recording, 160 

Special rooms, 14 

Speech ratings, 397 

Staff participation in planning, 20 

State, insurance systems, 73; regulations, 
59 

Statewide testing programs, 414! 

Statistical inference, problems, 431 

Statistics, descriptive, 430; in history, 
344; theory, 423 

Student activities, sirveys, 164 

Student activity, and scholarship, 165 

Study, habits, 274; procedures, 223 

Success, prediction by tests, 139 

Superior students, 117 

Surveys, 360; bibliography, 366; cur- 
riculum, 212; guidance, 132, 178; of 
student activities, 164; school plant, 10 

Survey, Technics, 394; visual, 364 

Swimming pool, 21 

Syndrome, 355 


Tables, statistical, 441 

Tabulating machines, 401, 442 

Taxation, for school plant, 80 - 

Teacher education, 321; in science, 306; 
in-service, 321, 322; preservice, 325; 
recommendations, 328; studies, 362 

Teacher ratings, 396 


468 


Teacher training, plant facilities, 38 

Teachers colleges, curriculum, 211 

Teaching profession, 180 

Technics of research, analysis, 377 

Telebinocular, 364, 403 

Temporary buildings, 25 

Test construction, 409 

Testing programs, 363; national, 415- 
regional, 415; statewide, 414 

Test results, evaluation, 412 

Test technics, evaluation, 411 

Tests and measurement, literature, 408; 
trends, 409 

Tests, Armed Forces Institute, 413; scor- 
ing, 410; use in guidance, 138; valida- 
tion, 410 

Test-scoring machines, 401 

Test scores, factors affecting, 411 

Textbooks, analysis, 292; illustrations, 
291; in research methods, 346 

Theaters, college, 36 

Theology, and education, 199 

Thermal balance, 51 

Theses, lists, 339 

Times test, 415 

Torts, 88 

Transfer, of training, 231 

Transfer students, 120 

Trend studies, 360, 364 

Trends, in personnel training, 189; in 
teaching science, 306 


Unit costs, 79 
U. S. Office of Education reports, 342 


Validation of tests, 410 

Value of school plants, 77 

Variance, analysis, 380, 443 

Ventilation, 51 

Veterans, suggested occupations, 180 

Visual aids, 243, 290; equipment, 245; 
evaluation, 247, 290; facilities for, 15 

Vocabulary, frequency studies in mathe- 
matics, 318 

Vocabulary studies, 277; in science, 306 

Vocational education, plant facilities, 29 

Vocational, information, 173; opportuni- 
ties, 178 


War Manpower Commission, 178, 188 

Wartime, needs, 24 

Work schedule, for custodians, 62 

Workers, distribution, 176; distribution 
within states, 177; number, 175 
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